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OG NETGEAR Universal Wifi Adapter 
By Andrew Gauld 

The trend towards increased internet connectivity of me- 
dia devices (TV's, gaming consoles, DVR’s) has brought 
a work-around for one of few my frustrations with BSD 
operating systems — the limited support for newer wire- 
less adapters. Many of these media devices have an eth- 
ernet port, but no way to attach a wireless adapter. Sev- 
eral companies have stepped up to this opportunity and 
have created universal wireless adapters that connect to 
the ethernet port rather than an expansion port. Since the 
device connects to the ethernet port, no driver is needed. 
Since no driver is needed, these devices should work with 
BSD operating systems. In this article, | will test Netgear’s 
Universal Wifi Adapter, model WNCE2001. 


Developers Corner 


10 Automating the Deployment of FreeBSD 
and PC-BSD Systems 


By Kris Moore 

In PC-BSD 9.x every installation is fully-scripted, due to 
the the pc-sysinstall backend. This backend can also be 
used to quickly automate the deployment of FreeBSD 
servers and PC-BSD desktops using a PXE boot envi- 
ronment. In PC-BSD & TrueOS 9.1 and higher, this func- 
tionality is easy to setup and deploy using the “pc-thincli- 
ent” utility. PXE booting allows you to boot systems via 
the LAN interface, as opposed to using traditional media, 
such as DVD or USB. In order for clients to boot via PXE 
they will need a PXE capable network adapter. 


Get Started 
414 Network Concepts, Routing and Fire- 
walls 


By Anders Berggren 
This article is aimed at anyone who wants to learn more 
about networking, routers and firewalls. We will discuss 
this topic in terms of a BSD/PF firewall/router. 


How To 


18 FreeBSD as a NAT Instance in Amazon 
Cloud 
By Andrey aka vand777 
Amazon VPC lets you launch instances in a virtual net- 
work that closely resembles a traditional network that you 
might operate in your own data center. You place pub- 
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licly accessible servers (for example, web servers, DNS 
server etc.) into a public-facing subnet, and place your 
backend systems (databases, application servers etc.) 
in a private subnet with no Internet access. Instances in 
the private subnet can access the Internet only by rout- 
ing their traffic through a NAT instance in a public subnet. 
This article is intended for beginners wanting to install and 
run FreeBSD as a NAT instance in Amazon Virtual Private 
Cloud (Amazon VPC). 


30 PostgreSQL: Indexes (Part 2) 
By Luca Ferrari 

This article continues the previous one, presenting the 
readers with a few index examples and how the access 
costs are computed by the query planner. All the exam- 
ples shown here have been tested on a PostgreSQL 9.1 
cluster running on a FreeBSD 8.2-RELEASE machine; all 
the example source code are available in a GitHub reposi- 
tory. 


4.2 FreeBSD Enterprise Search with Apache 
Solr (Part 3) 


By Rob Somerville 

One of the important facets of enterprise search is to be 
able to search internal (Intranet) and external websites. 
On a smaller scale, it is relatively trivial to assemble some 
code in PHP or Perl to pull web pages from a site, extract 
the links from the HTML and then “wash, rinse, repeat”. 
The difficulty arises when we want to index, rank, and ef- 
fectively manage these results on a large scale. Almost 10 
years ago, Apache Nutch was developed as the key tech- 
nology to crawl 100 million webpages, and has proved 
time and again that it is an efficient scalable solution. 





BSD : 


MAGAZINE 


WHAT’S NEW 





NETGEAR Universal Wifi 


Adapter (WNCE2001) 


The trend towards increased internet connectivity of media 
devices (TV's, gaming consoles, DVR’s) has brought a work-around 
for one of few my frustrations with BSD operating systems — the 
limited support for newer wireless adapters. 


any of these media devices have an ethernet 
\/ port, but no way to attach a wireless adapter. 

Several companies have stepped up to this op- 
portunity and have created universal wireless adapters 
that connect to the ethernet port rather than an expan- 
sion port. 

Since the device connects to the ethernet port, no driv- 
er is needed. Since no driver is needed, these devices 
should work with BSD operating systems. In this article, 
| will test Netgear’s Universal Wifi Adapter, model WNCE 
2001. 

The WNCE2001 is a small unit that connects to 
your computer's ethernet port. For power, you get to 
choose between using an AC wall adapter and us- 
ing a USB port — a great option when power outlets 
are limited. The unit is configured through your internet 
browser. 

The hardware I’ve chosen for this test is the Zotac 
ZBox Plus. This computer does not have an internal DVD 
drive, so | will attach an ASUS external DVD drive, mod- 
el SDRW-08D2S-U, which is powered through a second 
USB port. 

Since the four rear USB ports are now being used by 
the keyboard, mouse and DVD drive, | will power the 
WNCE2001 using the AC wall adapter. 

To test the WNCE2001, | will preconfigure the device 
using a live CD of GhostBSD LXDE 3.0 RC1. 
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Step 1 

| created a bootable CD of GhostBSD LXDE 3.0 RC1. You 
can find download links for GhostBSD images at http:// 
www.ghostbsd.org/. 


Step 2 

| assembled the hardware. | attached the WNCE2001 to 
the ethernet port before | turned the computer on. This 
was to ensure that the ethernet port received an IP ad- 
dress from the WNCE2001 when it turned on the DHCP 
client during bootup. 


Step 3 

| inserted the GhostBSD CD into the external DVD drive 
and turned the computer on. In order to boot up the 
Zbox using the external DVD drive, | had to hit the F11 
key on the keyboard. The Zbox then presented a list of 
medium sources. | selected the external DVD drive and 
hit Enter. 


Step 4 

Wireless configuration: It took several minutes for the 
ZBox to boot up to GhostBSD. When it completed, | 
was presented with the LXDE window manager. | used 
the menu buttons on the bottom panel and looked for a 
virtual terminal application. Upon finding and activating 
LXTerminal, | executed the command ifconfig -a. The 
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output can be viewed in Figure 2. As you can see, the 
network interface reo has been assigned an IP address of 
192.168.1.100. 

As | mentioned before, the WNCE2001 is config- 
ured using an internet browser. | found an icon for the 
browser SeaMonkey on the bottom panel. After click- 
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alo: ftlags=86435<LP, BROADCAST ,RUNNING, SIMPLEX,MULTICAST> metric 0 mtu 2290 
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Figure 2. The ethernet adapter, re0, obtained an IP address via DHCP 
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The BSD Certification Group Inc. 
(BSDCG) is a non-profit organization 
committed to creating and 
maintaining a global certification 
standard for system administration 
on BSD based operating systems. 





@) WHAT CERTIFICATIONS ARE AVAILABLE? 


BSDA: Entry-level certification suited for candidates 
with a general Unix background and at least six months of 
experience with BSD systems. 


BSDP: Advanced certification for senior system administrators 
with at least three years of experience on BSD systems. 
Successful BSDP candidates are able to demonstrate 

strong to expert skills in BSD Unix system administration. 


@ WHERE CAN I GET CERTIFIED? 


We're pleased to announce that after 7 months of 
negotiations and the work required to make the exam 
available in a computer based format, that the BSDA 
exam is now available at several hundred testing centers 
around the world. Paper based BSDA exams cost $75 USD. 
Computer based BSDA exams cost $150 USD. The price of 
the BSDP exams are yet to be determined. 


Payments are made through our registration website: 
https://register.bsdcertification.org//register/payment 


@) WHERE CANIGET MORE INFORMATION? 


More information and links to our mailing lists, LinkedIn 
groups, and Facebook group are available at our website: 
http://www.bsdcertification.org 


Registration for upcoming exam events is available at our 
registration website: 
https://register.bsdcertification.org//register/get-a-bsdcg-id 


WHAT’S NEW 


ing on the SeaMonkey icon, SeaMonkey opened to the 
the NetGear Smart Wizard at http:/www.mywifiext.com/ 
welcome.htm. On this page, | was prompted to select a 
language. | accepted the default (English) and clicked 
‘Continue’. 

The configuration tool searched for available networks. 
On the next page, | was presented a list of networks from 
which to choose. You can see this list in Figure 3. 

After selecting a network and clicking on ‘Continue’, a 
webpage appeared showing the security protocol used 
by the network. | was prompted for a passphrase. | en- 
tered the passphrase and clicked ‘Continue’. The next 
page informed me that the WNCE2001 was connecting 
to the network and that the process would take approxi- 
mately 2 minutes. Once this was done, a webpage ap- 
peared that telling me that the computer was connected 
to the network successfully. The page also showed the 
network name, security protocol and the passphrase | had 
submitted. | clicked on the button, ‘Finish’ and closed the 
browser. To test for internet access, | started SeaMonkey 
again and entered htto:/www.ghostbsd.org/ in the url bar. 
The home page for GhostBSD appeared. 

| selected the LXTerminal window and executed the 
command ifconfig -a. The IP address for reO had been 
changed to 192.168.1.78. 
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Figure 3. The configuration wizard displays available networks 


Sy Gnome meets BSb) 
ei | ea ae 









, : Fey (ic 
default collection! 
it 
ee 





- 
WORM D ra yuh i! oak Lt k - 
is AE i , 


Figure 4. The computer now has wireless internet access 
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Step 5 
Curious as to whether | would have to reconfigure the 
WNCE2001 with every use, even when using the same 
network, | decided to reboot the computer. Just for fun, | 
used a Live DVD of PCBSD 9.0. Booting PCBSD 9.0 on 
the ZBox was uneventful except that the live image was 
copied into memory, which took several minutes. When 
it was done, | was given a choice of KDE, Gnome, LX- 
DE or XFCE. | selected LXDE. When LXDE had loaded, 
| opened the Epiphany web browser. The default page, 
http://www.google.com, appeared. The WNCE2001 did 
not have to be reconfigured after rebooting the com- 
puter. 

lf you ever need to reconfigure the adapter, input http:// 
www.mywifiext.com/welcome.htm into the url bar. This 
was the address of the page that appeared in Step 4 when 
we configured the adapter for the first time. When | en- 
tered it, the first webpage of the NetGear Smart Wizard 
appeared, as in Step 4 above. 


Conclusion 

What have we learned? We have established that the 
use of universal wifi adapters and an ethernet port can 
by pass driver compatibility issues for wireless network 
access. We have also established that the ASUS SDRW- 
O8D2S-U and Zotac ZBox Plus (excluding the internal 
wifi adapter) are compatible with GhostBSD 3.0 RC1 and 
PCBSD 9.0. 


ANDREW GOULD 

Andrew Gould is a Senior Revenue Analyst in the healthcare in- 
dustry. He has been using FreeBSD, PostgreSQL and Python for 
over ten years. 
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Automating 





the Deployment of FreeBSD and PC-BSD® Systems 


In PC-BSD 9.x every installation is fully-scripted, due to the the 
pc-sysinstall backend. This backend can also be used to quickly 
automate the deployment of FreeBSD servers and PC-BSD 
desktops using a PXE boot environment. 





tionality is easy to setup and deploy using the pc- 

thinclient Utility. PXE booting allows you to boot 
systems via the LAN interface, as opposed to using tra- 
ditional media, such as DVD or USB. In order for clients 
to boot via PXE they will need a PXE capable network 
adapter. 


. n PC-BSD & TrueOS™ 9.1 and higher, this func- 


The Initial PXE Setup 

To get started, you will need to have a system with two 
network interfaces running PC-BSD or TrueOS 9.1 and 
a complete ports tree in /usr/ports. If you do not have 
the ports tree installed, you can download it by running 
the command portsnap fetch extract update as root. With 
these pieces in place, open a root prompt and run the pc- 
thinclient command. The first screen you see will look 
something like this: Figure 1. 





Figure 3. Selecting the NIC for DHCPD 
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a6, 199 


Enter “y” to continue, and the following screen will be 
shown: Figure 2. 

In this case you are going to be setting up a PXE instal- 
lation server, so enter “i” to continue. (The “r’ option can 
be used to make your system a X thin-client server. More 
information about this can be found on the wiki page at 
the end of the article). After selecting your type of PXE 


system, the thin-client wizard will then begin to build the 








Figure 4. The finish screen for pc-thinclient 
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Figure 5. Booting the client from PXE to the ThinClient Menu 
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net/isc-dhcp42-server port. You will most likely only need 
the default port options, and can hit enter at any confirma- 
tion screens that appear. Once the port has finished in- 
Stalling, the thin-client setup will prompt you again for the 
PXE network interface to use: Figure 3. 





Listing 1. Sample configuration of pc-sysinstall for a PXE client 


#Sample configuration file for an installation using 


PC-=sSVeinstall 


installMode=fresh 
installinteractive=no 


hostname=examplesystem 


# Set the disk parameters 
disk0=ada0 

partition=all 
bootManager=none 


commitDiskPart 


# Setup the disk label 

# All sizes are expressed in MB 

# Avail FS Types, UFS, UFS+S, UFS+J, ZFS, SWAP 
# Size 0 means use the rest of the slice size 
disk0-part=UFS+SUJ 1000 / 

disk0-part=SWAP 2000 none 

disk0-part=UFS+SUJ 0 /usr 


commitDiskLabel 


# Set if we are installing via optical, USB, or FTP 
installType=FreeBSD 

installMedium=local 

localPath=/installarchive 

packageType=tar 


installFile=fbsd-release.txz 


# Set the root pass 


rootPass=root 


# Setup our users 

# Setup our users 
userName=kris 
userComment=Kris Moore 
userPass=kris 
userShell=/bin/csh 


userHome=/home/kris 





userGroups=wheel, operator 


commitUser 
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Enter the interface name of the network card you wish 
to use as the PXE interface. This interface will be run- 
ning the DHCP server, and should not be connected to an 
existing network with another DHCP server running. The 
wizard will then finish up the configuration for PXE boot- 
ing, and display a message similar to this: Figure 4. 

Your initial PXE setup is now complete! You may now try 
to PXE boot a client connected on the network interface 
you specified. If the client boots successfully, you should 
be presented with an installation screen like Figure 5. 

By selecting the “install” option you will be presented 
with the example configuration option, which can be used 
to install a basic FreeBSD system. (Below we will look at 
adding your own) By selecting it, you will be asked to con- 
firm one last time that you wish to perform the installation 
using this configuration, and then the installation will pro- 
ceed. In addition to performing installs, you can also ac- 
cess an emergency shell prompt. This can be very useful 
if you have a system which can no longer boot, and you 
wish to access the disk or attempt repairs of some kind. 


Customizing the Installation Scripts 

With your initial PXE configuration finished, you will now 
most likely want to create your own installation scripts. 
The thin-client wizard creates an example installation 
script in the /usr/home/thinclient/installscripts/ direc- 
tory, which is where you will want to place your custom 
scripts as well. Any scripts placed in this directory will be 
selectable as an installation option on the PXE client. 

With the location of the install scripts in hand, lets now 
take a look at the provided pc-sysinstall.example file in 
that directory (Listing 1). 

The included comments in the example make most of 
the functionality fairly self-explanatory. However there 
are a few sections we will take a closer look at custom- 
izing. The first place to note is the installMedium=local 
and localPath=/installarchive options. The pc-sysinstall 
backend supports a number of methods of fetching the ar- 
chive files for installation. In this case we are using the lo- 
cal file fbsd-release.txz, stored in /usr/home/thinclient/ 
installarchive. This directory will appear to PXE clients 


PRO Tip! 

By using the Warden utility in PC-BSD / TrueOS, it is possible 
to setup your install environment inside a jail on the host sys- 
tem. Then when you are satisfied with the configuration, you 
can stop the jail and create the install archive on the host sys- 
tem with the tar command as shown below: 


= tar Cvur / usr) home, Lainclienic/instal larchive/ 


Myarchivetx7 =C0/usw/ jails < jail 
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aS /installarchive, allowing you on the host OS to sup- 
ply your own tar archives of any FreeBSD / PC-BSD re- 
lease or configuration you wish. 

After selecting your installation archive options you will 
most likely want to customize the disk layout of the new 
system. In the example provided we are performing a very 
simple full-disk installation to the disk adao, using UFS 
with Soft Updates + Journaling. The resulting installation 
will contain a 1000MB root partition, a 2000MB swap, and 
use the rest of the disk space for /usr as seen Listing 2. 

In addition to UFS support, the pc-sysinstall backend 
can also handle more advanced ZFS disk layouts in a 
similar manner: Listing 3. 





PRO Tip! 

Since every single install with pc-sysinstall is fully scripted, 
a good way to experiment with alternative ZFS layouts is us- 
ing a virtualization tool such as VirtualBox and your PC-BSD in- 
stallation DVD. By running the install GUI, you can customize 
your ZFS disk layout in an easy-to-use manner inside a virtual 
machine. Then when the installation is finished, a copy of the 
saved pc-sysinstall configuration file will be saved onto the in- 
Stalled system at /root/pc-sysinstall.cfg. 














Listing 2. Adjusting the UFS disk layout for this thin-client 


#Set the disk parameters 
disk0=ada0 

partition=all 
bootManager=none 


commitDiskPart 
# Setup the disk label 


# Avail FS Types, UFS, UFS+S, UFS+J, ZFS, SWAP 
# Size 0 means use the rest of the slice size 
disk0-part=UFS+SUJ 1000 / 

disk0-part=SWAP 2000 none 

disk0-part=UFS+SUJ 0 /usr 


commitDiskLabel 


Listing 3. Switching to ZFS for the default client FS 
#Setup the disk label 


# All sizes are expressed in MB 

# Avail FS Types, UFS, UFS+S, UFS+J, ZFS, SWAP 
# Size 0 means use the rest of the slice size 
disk0=part—ZnS 0 7 /Usr,/var, root (minrors adel 


commitDiskLabel 
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In the example above we have changed our disk lay- 
out from UFS to a single ZFS pool using the entire disk, 
and added the disk drive adai1 to the resulting zpool as a 
mirror device. While this example is fairly simplistic much 
more complex layouts can be created using ZFS, includ- 
ing raidz, dataset options and more. 

One last feature to look at with the thinclient utility is the 
ability to perform completely unattended installations. If 
you plan on only doing fully-automated installations via 
this PXE interface, you may do so simply by creating a 
configuration script named unattended.cfg and placing 
it in the /usr/home/thinclient/installscripts/ direc- 
tory alongside the example. When a client first boots it 
will check for this unattended.cfg file, and if found it will 
automatically use it for installation. Some caution should 
be taken when using this method, since simply plugging 
a PXE booting system into the wrong LAN cable could 
cause it to be re-installed. 


Conclusion 

We have briefly looked at some of the ways PC-BSD and 
TrueOS 9.1 make it easy to setup PXE booting and per- 
form rapid deployment of FreeBSD based systems. We 
have also just begun to scratch the surface of the type 
of installations that can be performed with the pc-sysin- 
stall backend. If you want to do further research into more 
complex installations please take a look at the included 
wiki references on pcbsd.org. We also will be happy to 
continue the discussion with you on the PC-BSD mailing 
lists or forums. 


KRIS MOORE 

Kris Moore is the founder and lead developer of PC-BSD. He lives 
with his wife and four children in East Tennessee (USA), and en- 
joys building custom PC’s and gaming in his (limited) spare time. 
kris@pcbsd.org. 
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Network Concepts, 
Routing and Firewalls 


This article is aimed at anyone who wants to learn more about 
networking, routers and firewalls. We will discuss this topic in 


terms of a BSD/PF firewall/router. 


What you will learn... 

¢ Important networking concepts 
¢ Routing basics 

¢ The PF firewall 


lon, building FreeBSD and OpenBSD based network 

appliances, and the Internet is one of the things that 
fascinates me the most. In my experience, a good un- 
derstanding of the basic principles of IP and networking 
is actually the most important asset when it comes to 
master the Internet. Some readers will surely find some 
of the topics in this article trivial, but | still believe that the 
full comprehension of those trivial facts is key to mak- 
ing people more confident in the area of networking. 
Throughout this article, we will learn about IP, network- 
ing and BSD-based firewalls and routers. Kind of a BSD 
routing crash course. 


work as a product designer at a company called Ha- 


OSI Layers 
Let's start by visualizing the oh-so-important network lay- 
ers. Google the “OSI model” to get the full picture. Be- 
cause | will discuss IP primarily, I'll only talk about layer 2 
and layer 3. When | say “layer 2” you should be thinking 
about Ethernet, hardware (MAC) addresses and switch- 
es. You should also think “broadcast”, because it’s one of 
the fundamental features of Ethernet. Whenever you con- 
nect computers to the same switch, they can broadcast 
freely to each other. 

When | say “layer 3”, you should be thinking IP ad- 
dresses, routers, firewalls and the Internet. Whenever 
you have a network-related problem that you find com- 
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What you should know... 
¢ BSD basics 
¢ Network basics 


plex or diffuse, try to tear the problem apart and put the 
different pieces in categories such as the OSI model’s 
layers. Alright, this started out a bit vague, but please 
read on. 


IP Addresses 
ll head directly to the topic of IP addresses, because it 
connects the topics of layer 2 and layer 3 nicely. 

ll start by describing CIDR notation, which | will use 
throughout this article. Have you seen a net mask such as 
259.255.255.0? You probably have. Anyway, it basically 
says that the subnet it defines has 256 addresses. There- 
fore, it’s a quite common net mask for home or small busi- 
ness networks, with fewer than 250-something devices. 
What it really states is that: “on this layer 2 network, ev- 
eryone with the same IP address except for the last digit 
may speak with each other directly”. In other words, if I’m 
192.168.0.2 and you are 192.168.0.3 we can speak di- 
rectly using layer 2 (a switch; never even considering go- 
ing via a router) because the first three digits of our IP ad- 
dress are the same. However, 192.168.0.2 cannot speak 
directly to 192.168.1.3 on a 255.255.255.0 subnet, be- 
cause their third digit differs. So, what’s CIDR then? Well, 
i's actually just the notation of writing “192.168.0.1/24” 
instead of “192.168.0.1 with net mask 255.255.255.0”. It 
means the same thing, but is a lot shorter. In the same 
fashion, 255.255.0.0 (a subnet with 65536 addresses in 
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it) has prefix length 16. Try Googling “CIDR notation” to 
read more, or find a translation table between net masks 
and prefix lengths. Let’s proceed with a few bullet points 
that we can discuss: 


¢ IP addresses are assigned to network interfaces. A 
network interface can be physical (Such as an Ether- 
net adapter) or logical (such as a VLAN or tunnel). 

¢ The IP address and the net mask (or prefix length, as 
it’s also called) defines the “directly reachable” (layer 
2) network in the routing table. 


For example, let’s say | have a BSD computer/router with 
two network interfaces, and assign 192.168.0.1/24 to one 
of them (the device em1, that | call “LAN”). This can be 
done at runtime by running 


# ifconfig eml 192.168.0.1/24 


| also connect this LAN interface with a switch, which 
has a few computers attached to it. These computers 
has IP address such as 192.168.0.2, .3, et.c. If | “ping” 
one of those computers from my “router”, it will find the 
appropriate interface to send out this ping on by first 
looking at the “layer 2” (directly reachable) part of the 
routing table. This is evident by running: 


# netstat -rn -f inet 
Destination Gateway Flags Refs Use Mtu Prio Iface 


192.168.0/24 link#2 UC 5 0 - 4 eml 


I've actually told the router that the addresses 
192.168.0.0 to 192.168.0.255 (defined by my address 
192.168.0.1 and the prefix length 24) are directly reach- 
able using layer 2 on the LAN interface, and the router 
will not even try layer 3 routing to find the destination. In- 
stead, the router makes an ARP request (trying to find 
the IP address’ MAC address) on the LAN interface, and 
if it gets a response, sends the ping to that MAC address 
out on the LAN interface. This surely sounds trivial to 
many, but is actually very important. We can make a few 
important assumptions from this: 


¢ You should almost never assign two address with- 
in the same subnet to two (or more) different inter- 
faces, because it confuses the computer. Let’s say 
I've assigned 192.168.0.1/24 to one interface (LAN) 
and 192.168.0.10/24 to another interface (WAN). 
They are, as defined by the /24 prefix length, in the 
same layer 3 subnet, and thus in the same ‘lay- 
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er 2 domain’. Consequentially, the computer/rout- 
er doesn’t know on which interface to send packets 
to 192.168.0.X, because it has two choices. The ex- 
treme case would be to assign the same IP address 
to two different interfaces, which of course, is equal- 
ly forbidden. There are exceptions, but this a good 
rule of thumb. 

¢ You almost never need to create routes to direct- 
ly connected networks. Too many times I’ve seen 
people trying to solve problems by randomly adding 
routes. If you have an address such as 192.168.0.1/24 
on LAN, you dont need to create a route to 
192.168.0.0/24, the BSD kernel should have added a 
proper layer 2 directly connected route for you. 


Routing 

Anything that isn't directly connected in the sense that we 
discussed in the previous section, IP routes are needed. 
The by far most common route is the default gateway, 
which really is 0.0.0.0/0 in network language. The default 
route is added like: 


# route add default 1.1.1.1 


Here are a few important facts. As usual, they don't al- 
ways apply, but are good rules of thumb. 


¢ The most specific route is chosen. If you have 
one default route via your gateway, and a route to 
10.0.0.0/16 via another one (router 2), a packet to 
10.0.1.1 will be sent to router 2. 

¢ All routes has a gateway, and that gateway has to be 
directly reachable using layer 2. That is, if you on- 
ly have an address such as 192.168.0.2/24 (mak- 
ing 192.168.0.X directly reachable) you cannot add 
a route to 10.1.0.0/24 via 172.16.0.1, because your 
computer doesn’t know how to speak with 172.16.0.1 
using layer 2. A route to 10.1.0.0/24 via 192.168.0.1 
would be valid, however. 

¢ Routing is uni-directional. Just because you can send 
packets to me, doesn't mean | have a route back to 
you via which | can send the responses. One exam- 
ple would be a VPN appliance. Say you have a serv- 
er (192.168.0.2), connected via a router (192.168.0.3) 
to the internet. On the same network as the server, 
you places a VPN appliance (192.168.0.3). To this 
VPN appliance a roaming user connects, and get’s 
the VPN address 10.0.0.2. The VPN user tries to 
ping 192.168.0.2. The packet travels over the VPN 
to the VPN appliance, which sends it to 192.168.0.2. 
It has reaches it's destination. The server howev- 
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er, responds by sending a packet to 10.0.0.2 via its 
default route 192.168.0.1 (because neither the serv- 
er nor the router know about 10.0.0.0/16 being be- 
hind the VPN appliance 192.168.0.3) and the packet 
is lost. The roaming VPN user doesn't get a ping re- 
sponses. However, adding a route to 10.0.0.0/16 via 
192.168.0.3 on the server or the router would have 
solved the issue. 


A router typically forwards packets, and in most BSD op- 
erating systems this is enabled by running 


# sysctl net.inet.ip.forwarding=1 


for |Pv4. Please note that all of the commands | run has 
to be made permanent by adding configuration to the ap- 
propriate files in /etc. This however is very well docu- 
mented in the respective BSD’s manuals, and not the fo- 
cus of this article. 


Dynamic Routing 

For a long time, routers have typically been hardware ap- 
pliances from companies such as Cisco or Juniper. In re- 
cent years however, open source routing has matured to 
the point of being highly suitable even for demanding in- 
stallations. A modern computer with good network adapt- 
ers can easily handle several hundreds of thousands 
packets per second. 

The by far most common application for BGP routing 
is multi-homing; that is, making your services redundant 
via multiple ISPs. This of course requires redundant con- 
nections, an AS number and provider-independent IP ad- 
dresses, which could be fairly expensive to get hold of. 
Ask your ISP for more information. Once that is man- 
aged however, the configuration is not at all that com- 
plicated. In case you're running OpenBGPD, add to 
bgpd.conf 


AS your-As 

network your-network 

neighbor ISPs-router { 
remote-as ISPs-As 

} 

neighbor another-ISP {... 

deny filters... 


Although this is not a copy-and-paste example, it hope- 
fully gives you a glimpse of OpenBGPD’s elegance and 
simplicity. The same goes for OpenOSPFD; it’s ospfd. 
conf format is compact and comprehensible, and the 
program itself works like a charm. 
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Aliases Addresses 

In case you ask your ISP for multiple address, they might 
give you either a few addresses, or a whole network which 
you are expected to be the router for via a link network. 
In the first case, where you're just given a few address- 
es, you can add those to your WAN interface (called emo 
in this example) as alias addresses with a all-ones (/32) 
net mask. The reason to use /32 is to avoid adding layer 
2 routes for the addresses. If 1.1.1.2/27 is your primary 
IP address, and you're also given 1.1.1.3 andi.1.1.4 by 
your ISP, these can be added like: 


# ifconfig em0 1.1.1.2/27 
# ifconfig em0 1.1.1.3/32 
# ifconfig em0 1.1.1.4/32 


The PF Firewall 

Most BSDs ship with the packet filter (PF); a “stateful” lay- 
er 3 (e.g. IP) to layer 4 (e.g. TCP) firewall. It’s dual-stack 
by default, meaning that all rules that doesn’t explicitly 
specify an address family works for both IPv4 and IPvé6. 
This, along with other nice properties of its configuration 
format, makes it (almost) a joy to create rules for. In fact, | 
like the PF configuration format that much we decided to 
use it straight off in our own firewall software. There are a 
few fundamental concepts which are good to know about: 


¢ The firewall’s ruleset is evaluated every time a packet 
goes in or out of an interface 

e Interfaces can be interface groups, hardware ports, 
VLAN interfaces, etc. 

¢ The last matching rule wins, unless a rule is marked 
by the quick keyword (| recommend making rules 
quick) 


| personally prefer to do filtering on interface groups in- 
stead of interface; both for naming them, and grouping 
them. | also find it extremely handy that the interface 
with the default route on it, is automatically added to the 
“egress” group (which can be considered WAN). To add 
the emi interface to a “lan” group, run: 


# ifconfig eml group lan 

Assuming that the WAN interface is in the egress group, 
we could create a simple firewall ruleset performing NAT 
by adding to /etc/pf.conf: 

block Log 


pass on lan 


pass out on egress nat-to (egress) 
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and apply the changes by running 
# pfctl -f /etc/pf.conf 


That’s a pretty compact ruleset, don’t you think? Its 
beautiful syntax also makes it very clear just what it 
does. One could add a port forwarding to this setup very 
simply, by adding to /etc/pf£.conf 


pass in on egress to port 80 rdr-to 192.168.0.100 
Where 192.168.0.100 could be a web server on your LAN. 
The, in my experience, most common pitfall is to under- 
estimate the importance of the rules’ order. For example, if 
using mostly “quick” rules (in which case the first matching 
rule “wins’”) it’s usually appropriate to move more specific 
rules towards the top, and more general rules towards the 
bottom. Consider the example below 


block log 
Pass quick on lan 
pass in quick on lan to (wan) port 80 rdr-to 10.0.0.100 


# doesn’t work 


The function of the third rule in the example above would 
be to redirect traffic headed for the WAN interface’s IP to 
an internal server on a DMZ (sometimes referred to as 
“reflection”, being able to access internal resources on 
external addresses from within the network). That won't 
work however, because that rule will never match. The 
rule just above it, the second rule, is more general, and 
will always “win”. The solution would be to switch places 
of the second and third rule. 

Now that you’ve seen some of PF’s elegance, you could 
go on and read OpenBSD’s excellent PF FAQ. The manu- 
al page for pf.conf is also great once you get the hang of it. 


Debugging and Monitoring 
| will not cover debugging in depth, but give you some 
hints and pointers. The manual pages will do the rest. 

As for general network and firewall debugging, trac- 
ing packets can be a very quick way of locating errors. 
Let's say you have a “case” which you expected to work, 
but which doesn't. You might not reach a web server in a 
DMZ, but other servers in the DMZ are responding. You 
don't know what the problem is, or if it’s even in the fire- 
wall. Anyway, start tracing packets. Make sure you have 
a device as initiator. In this case, we want do debug an 
issue only visible from the “outside”. Therefore, use a 
smartphone connected to cellular data and start trying to 
access the web server while having this command run- 
ning on the firewall: 
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# tcpdump -n -i em0 port 80 and host 1.1.1.2 


where emo iS your external interface and 1.1.1.2 is the IP 
youre trying to access from the smartphone. If you see 
traffic, move on the firewall logging 


# tcpdump -n -e -ttt -i pflogO port 80 


and see if it’s blocked (requires all your block rules to 
have the “log” keyword). If it’s not blocked, move on the 
internal interface where you expect the packets to show 


Up: 
# tcpdump -n -i eml port 80 and host 192.168.0.100 


where emi is your internal interface and 192.168.0.100 
is the IP that you might have done a port forwarding 
to. If nothing shows up, start looking at the firewall ru- 
leset, and make sure that the desired packets are 
in fact matches by your port forward. Perhaps take a 
look at the routing table to make sure that packets to 
192.168.0.X are in fact expected to be sent out on emi. 
And so on. If you however do see packets going out on 
em1, but you dont see the responses, start debugging 
the server. Is it connected? Does it provide any service 
on port 80? Is the server configured with the firewall as 
its default gateway? Otherwise the packets aren't rout- 
ed back. 

Certain debugging and monitoring tasks require you 
to extract information from the system, such as from PF 
or the BGP process. For this, there are handy little pro- 
grams called pfctl, bgpctl, ospfctl, et.c. Google those, 
and you'll find tons of information in manual pages and 
tutorials. 
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as a NAT Instance in Amazon Cloud 


This article is intended for beginners wanting to install and 
run FreeBSD as a NAT instance in Amazon Virtual Private Cloud 


(Amazon VPC). 


What you will learn... 

¢ How to install, configure and update a FreeBSD instance in Amazon 
Cloud, 

¢ How to configure private and public subnets, routing and security 
groups in Amazon VPC, 

¢ How to set up FreeBSD as a NAT instance between public and pri- 
vate subnets. 


mazon VPC lets you launch instances in a virtual 
A network that closely resembles a traditional network 

that you might operate in your own data center. You 
place publicly accessible servers (for example, web serv- 
ers, DNS server etc.) into a public-facing subnet, and place 
your backend systems (databases, application servers etc.) 
in a private subnet with no Internet access. Instances in the 
private subnet can access the Internet only by routing their 
traffic through a NAT instance in a public subnet. In the con- 
figuration wizard Amazon offers its “native”, Linux based, 
instance as the only choice for NAT instance. However, it 
is possible to configure VPC manually and use FreeBSD 
(or any other operating system) for NAT purposes. 


What we will do 


¢ Configure Amazon Virtual Private Cloud (VPC). 

e Launch a FreeBSD instance in a public subnet. 

¢ Configure route table for the private subnet. 

¢ Configure firewall for the FreeBSD instance. 

¢ Perform basic configuration of the FreeBSD instance. 

¢ Apply security patches and install a custom kernel. 

¢ Setup OpenVPN. 

e Launch a Microsoft Windows instance in the private 
subnet. 

¢ Test Remote Desktop Connection to the Microsoft Win- 
dows instance to prove that our NAT instance works. 
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What you should know... 
¢ FreeBSD basics, 
¢« Amazon Cloud basics. 


As the result of the above steps we will end up with: 


e the public (10.0.1.0/24) and private (10.0.2.0/24) sub- 
nets, 
¢ aFreeBSD NAT instance with two network interfaces: 
¢ the first network interface belongs to the public 
subnet and has the IP address 10.0.1.10, 
¢ the second network interface belongs to the pri- 
vate subnet and has the IP address 10.0.2.10. 
¢ a Microsoft Windows instance in the private subnet 
(IP address 10.0.2.20) which can be accessed (Re- 
mote Desktop Connection) through our FreeBSD 
NAT instance. All outgoing traffic of this Microsoft 
Windows instance goes through the FreeBSD NAT 
instance. 


Step 1. 

Configure Amazon Virtual Private Cloud 

In this step we will create and configure VPC (10.0.0.0/16).: 
public (10.0.1.0/24) and private (10.0.2.0/24) subnets, an 
Internet gateway and a route table for the public subnet. 


Task 1. Create VPC 
Log in into AWS Management Console and then choose 
the “VPC” item from the “Services” menu: Figure 1. 

In the “Navigation” pane, click the “Your VPCs’” link, and 
then click the “Create VPC” button: Figure 2. 
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Type 10.0.0.0/16 in the “CIDR Block” field and then 
press the “Yes, Create” button. 


Task 2. Create public and private subnets 

In order to create public and private subnets, click the “Sub- 
nets” link on the “Navigation” pane, and then click the “Cre- 
ate Subnet” button: Figure 3. You can select a desired avail- 
ability zone as well as specify subnet’s IP address block 
(“CIDR Block” field). You should create 10.0.1.0/24 as your 
public subnet, and 10.0.2.0/24 as your private subnet. 


Task 3. Create an Internet Gateway. 

Create and attach an Internet Gateway (click on the “In- 
ternet Gateways’ on the “Navigation” pane, and then click 
the “Create Internet Gateway” button). After creation, se- 
lect the Internet Gateway in the list and then click the “At- 
tach to VPC” button, select your VPC (10.0.0.0/16) and 
then click on the “Yes, Attach” button. 


Task 4. Create a route table for public subnet 
Create anew route table for your public subnet (10.0.1.0/24): 
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Figure 1. Amazon VPC Console Dashboard 





Create VPC Cancel xX 


A VPC is an isolated portion of the AWS cloud populated by AWS objects, 
such as Amazon EC2 instances. Please use the Classless Inter-Domain 
Routing (CIDR) block format to specify your VPC's contiguous IP address 
range, for example, 10.0.0.0/16. Please note that you can create a VPC no 
larger than /16. 


CIDR Block: (e.g. 10.0.0.0/16) 


Tenancy: Default + 











Figure 2. Create VPC 
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¢ Click on the “Route Tables” on the “Navigation” pane. 

¢ Click on the “Create Route Table” button, select your 
VPC (10.0.0.0/16) from the drop-down list, and then 
click “Yes, Create”: Figure 4. 

¢ This new route table will appear in the list (it will have 
“No” in the “Main” column). Select it in the list, and 
then add a new route (Destination: 0.0.0.0/0, Target: 
your Internet Gateway) in the last line, and then click 
“Add”: Figure 5. 

¢ Click on the “Associations” tab and then associate 
this route table with your public subnet (10.0.1.0/24): 
Figure 6. 


Step 2. Launch a FreeBSD Instance in a Public 
Subnet 

I'd like to take this opportunity to thank Colin Percival who 
creates AMI instances of FreeBSD in Amazon Cloud. 


Create Subnet Cancel x 


Please use the CIDR format to specify your subnet's IP address block (e.g., 
10.0.0.0/24). Please note that block sizes must be between a /16 netmask 
and /28 netmask. You can create no more than 20 subnets per VPC. Also, 
please note that a subnet can be the same size as your VPC. 
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Figure 3. Create Subnet 
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Create Route Table 


A route table specifies how packets are forwarded 
between the subnets within your VPC, the internet, 
and your VPN connection. 


VPC: vpc-fc6d1797 (10.0.0.0/16) ~ 


Figure 4. Create Route Table 
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Figure 5. New Route Table Rules 


MAGAZINE 
= 
Xe) 


HOW TO 


You can find available FreeBSD AMIs on his website at http:// 
www.daemonology.net/freebsd-on-ec2/. |n this step we will: 


¢ Launch a FreeBSD instance and setup its security 
group, 

¢ Disable source/destination checks (otherwise, NAT 
will not work), 

¢ Assign an Elastic IP address (EIP) to the NAT in- 
stance. 


Task 1. Launch a FreeBSD instance. 


¢ Select the “EC2” item from the “Services” menu, and 
then click on the “Launch Instance’ button. 

¢ Choose the “Classic” wizard. 

¢ Click on the “Community AMIs” tab, type the desired 
AMI from the http:/www.daemonology.net/freebsd- 
on-ec2/ website in the “Search” field (for example, 
ami-1d4c9874 to install FreeBSD 9.0 RELEASE amd64 
in US East, North Virginia), and then press ENTER: 
Figure /. 

¢ | will install FreeBSD 9.0 RELEASE amd64/HVM as 
this was the latest release version at the time of this 
writing. 

¢ On the next wizard step choose the instance 
type, click on “VPC” and select your public subnet 
(10.0.1.0/24) from the list. Click “Continue”. 

¢ Select “2” in the “Number of Network Interfaces” 
drop-down list, and assign IP addresses to each in- 
terface: 





Routes Asceociations | Route Propagation 
Subnet Pte ale 
subret-e183298a (2000.1.0/24) | Disessociate | 
Select a subnet 


The folowing subnets have not been associated with any route tables and are therefore using the Main table routes: 
* submet-97So2ehe (10.0.2.0/24) 











Figure 6. Route Table Associations 
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Figure 7. FreeBSD Community AMIs 
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¢ 10.0.1.10 for etno (Subnet 10.0.1.0/24), 
¢ 10.0.2.10 for etni (change subnet to 10.0.2.0/24). 
e You can assign secondary IP addresses for each in- 
terface if needed. Then click “Continue” button twice. 
¢ Name your instance and then click “Continue”. 
¢ Create a new key pair or choose the existing one, 
and then click “Continue”. 
¢ Create a new security group (Figure 8): 
¢ Open port 22 for SSH (TCP), 
¢ Open port 1194 for OpenVPN (UDP). 
¢ Click “Launch”. 


Task 2. Disable source/destination checks on the 
NAT instance’s network interfaces 

By default, each instance performs source and destination 
checking. This means the instance must be the source or 
destination of any traffic it sends or receives. For a NAT 
instance to perform network address translation, we must 
disable source/destination check on our FreeBSD instance: 


¢ Click on the “Network Interfaces” on the “Navigation” 
pane. 

¢ Right-click on each of the interfaces of your NAT in- 
stance and choose the “Change Source/Dest Check” 
item from the popup menu: Figure 9. 

¢ You should disable source/destination checks on 
each of the network interfaces of your NAT instance: 
Figure 10. 

¢ | also usually make a note of the private network in- 
terface ID (the one with IP 10.0.2.10). This informa- 
tion will be needed later. 


Task 3. Assign an Elastic IP address (EIP) 
To access your NAT instance from the Internet, you should 
assign an Elastic IP address (EIP) to your NAT instance: 
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Figure 8. Create a New Security Group 


¢ Click on the “Elastic IPs” on the “Navigation” pane. 

¢ Click “Allocate New Address”. 

¢ Select “VPC” from the drop-down list and then click 
“Yes, Allocate”. 

¢ Select the allocated IP address in the list and then 
click on the “Associate Address” button. 

¢ Select your NAT instance in the “Instance” drop-down 
list and select “10.0.1.10*” in the “Private IP address” 
field. Click “Yes, Associate” button. 


Step 3. Configure route table for the private 
subnet 

We will use the main route table to configure routing for 
the private subnet: 


¢ Select the “VPC” item from the “Services” menu. 
¢ Click on the “Route Tables” on the “Navigation” pane. 
¢ Select the main route table and then add a new route: 
¢ Destination: 0.0.0.0/0, 
¢ Target: private interface of your NAT instance (with 
the IP 10.0.2.10). (We have made notes of the pri- 
vate network interface ID when we were disabling 
source/destination checks). See Figure 11. 
¢ Click on the “Associations” tab and then associate 
this route table with your private subnet (10.0.2.0/24): 
Figure 12. 


Now your Amazon VPC configuration is completed and we 
can focus on configuration of our FreeBSD NAT instance. 
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Figure 9. Change Source/Destination Checks Settings 
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Figure 10. Disable Source/Destination Checks 
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Step 4. Configure Firewall for the FreeBSD Instance 

In this step we will enable and configure PF firewall. You 

need to connect to your FreeBSD instance via SSH as 

root and then create the /etc/pf.conf file: Listing 1. 
Then enable PF firewall: 


echo ‘pf enable="YES”’ >> /etc/rc.conf 

echo ‘pf rules="/etc/pf.conf”’ >> /etc/rce.conf 
echo ‘pflog enable="YES”’ >> /etc/rc.conf 
/etc/rc.d/pf start 


SH HEHE HE 


Reconnect via SSH. 


Step 5. Perform Basic Configuration of the 
FreeBSD Instance 
In this step we will configure our FreeBSD instance: 


¢ Configure network interfaces. 

¢ Add anew user. 

¢ Configure SSH daemon. 

¢ Update ports tree. 

¢ Install openntpd. 

¢ Enable SSHGuard. 

¢ Install tools to manage ports updates and monitor vul- 
nerabilities. 


Task 1. Configure network interfaces 
Make sure that your server can act as gateway: 


# echo ‘gateway enable="YES”’ >> /etc/rce.conf 
Configure the 2" network interface: 


# echo ‘ifconfig xnl="”inet 10.0.2.10/24”' >> /etc/rc.conft 
# ifconfig xnl inet 10.0.2.10/24 


To improve network performance, it is important to do 
the following: 


# echo ‘net.inet.tcp.tso=0' >> /etc/sysctl.conf 
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The following subnets have not been associated with any route tables and are therefore using the Mam table routes: 





Figure 12. Main Route Table Associations 
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Reboot the server: 

# reboot 

Task 2. Add a new user 

In order to add a new user, execute the following com- 


mand: 


# adduser 


Please do not forget to “invite” your new user to the 
wheel group. It is more convenient and secure to set- 
up SSH authorization via a DSA/RSA key than to type 
the password all the times. Let’s do it. Generate DSA 
keys: 


# su your new user 
6 ssh-keygen -t dsa 
% cat ~/.ssh/id dsa.pub >> ~/.ssh/authorized keys 





Listing 1. /etc/pf.conf file contents 


sca) LA 


exe Ee 
cere pete a inl d 
Thee is 0 


Void Bea eu 


ial lee C Veen eee ee 0 


table <private subnet> { 10.0.2.0/24 } 
table <firewall> { self } 

table <openvpn clients> { 10.8.0.0/24 } 
table <sshguard> persist 

table <jails> { 192.168.50.0/24 } 


Scrub som oll 


# NAT 

nat on $ext if inet from <private subnet> to !<firewall> 
ao eee abs) 

nat on $ext if inet from <jails> to !<firewall> -> (Sext_ 
ie) 

Mee (Old Sabiae IIe eae elolsieuyoim vellive mes le) “Ce eisiciee syblomreie> 


Se (odes i) 


block log all 

block ney guluck On Sexu ih POLO Ge, trom <songuacc- 
IZ Onn Ge Geen) peo ters sega = lee Sion 
DmiReEOmes 

block an quick on ovemit Srevo een, from: <ooiguard- 
tO (Sess se) eee ssi Ieee “esia 


bruteforce” 


# OpenVPN 
pags In) On Sext) ii aimee prove Udp from any to (ext 1£) 


port 1194 keep state 


# ssh 
Pass On CXUn is PLO ECO EO ocr In) Pore son tages 
S/SA keep state 





pass in on $vpn_ if proto tcp from <openvpn clients> to 
(Sext_ if) port ssh flags S/SA keep 


state 


# RDP 
Pass (1 ,On VOneIE proLo tcp from) —Openvpa clicnra co 
Swin_ host port 3389 flags S/SA keep 


state 


# ICMP to the external IP and OpenVPN gateway IP 

Lene, Gypess—) | Scclored™ Unresch ||” 

Pass inom -cxe 1h Ince proko Temp trom amy bo (>ext. 18) 
Tempe et / Powe VCMPnE Peo me comseaee 

Pass 11) On Svpnelt nee prove tcmp From any EO (2vpm 11) 
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# pass out this server 


pass out proto { tcp, udp, icmp } all keep state 


# pass out private subnet 

pass In On Sink li proce {7 eco, udp, cmp.) from 
“Private rsubneu? to! <iimeweall” iags 
5/ Sh 


# pass out jails 
pass in, on [loc if proto { tep, udp, icmp) from <jails- 


to !<firewall> flags S/SA 
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Then copy your private key (~/.ssh/id_ dsa) to your Mi- 
crosoft Windows workstation: 


¢ Print your private key in the SSH console: 
% cat ~/.ssh/id_dsa.pub 


¢ Copy the printed text into a new id_ dsa file on your 
Microsoft Windows workstation. 
¢ On your Windows workstation, convert ia dsa key to 
Putty .opk without a passphrase: 
¢ Download PuttyGen from htto://the.earth.li/Yo7Esgta 
tham/putty/atest/x86/puttygen.exe and open it. 
¢ Click the “Load” button, choose “All files (*.*)”, and 
select your id dsa key. 
¢ Click on the “Save Private Key” button. 


From now you are able to connect via SSH by using your 
private key instead of typing your password all the times. 


But before you do this, let’s change root’s password: 


% exit 


# passwd 

Task 3. Configure SSH Daemon 

Make sure that SSH daemon listens on 10.0.1.10 only. 
Add the following line to /etc/ssh/sshd_config: 
ListenAddress 10.0.1.10 


Also disable login for root user: 


-PermitRootLogin yes 


+PermitRootLogin no 

Then restart SSH: 

# /etc/rce.d/sshd restart 

Before you close the existing SSH session please check 
that you can open a new SSH session as the user you 


added in the previous task. 


Task 4. Update ports tree 
Update ports tree for the first time: 


# portsnap fetch extract 
In future you should update ports tree as below: 


# portsnap fetch update 
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Task 5. Install Openntpd 
OpenNTPD provides the ability to sync the local clock to 
remote NIP servers and can even act as NTP server it- 
self, redistributing the local clock. 

Configure ntpdate to set the clock at boot time: 


# echo ‘ntpdate enable="YES”’ >> /etc/rc.conf 


# echo ‘ntpdate hosts="pool.ntp.org”’ >> /etc/rc.conf 
Install net/openntpd: 


# cd /usr/ports/net/openntpd 
# make config-recursive 


+ thake install clean 


Default configuration will suffice for our purposes 
but feel free to edit /usr/local/etc/ntpd.conf if need- 
ed. Configure openntpad to start at boot time and start it 
now: 


# echo ‘openntpd enable="YES”"'’ >> /etc/rc.conf 


# /usr/local/etc/rc.d/openntpd start 


Task 6. Enable SSHGuard 
SSHGuard monitors server from its logging activity. When 
logs convey that someone is trying to brute-force SSH 
password, SSHGuard reacts by blocking this host. 

Let's first install it: 


# cd /usr/ports/security/sshguard-pf 
# make config-recursive 


# make install clean 


Your /etc/syslog.conf has been added a line for SSH- 
Guard; uncomment it and then reload the configuration: 


# /etc/rc.d/syslogd reload 


This command will display the set of addresses blocked 
in the SSHGuard table at any time: 


# pfctl -Tshow -tsshguard 
Task 7. Install tools to manage ports updates and 
monitor vulnerabilities 


Install ports-mgmt/portmaster: 


# cd /usr/ports/ports-mgmt/portmaster 


# make config-recursive 


You should check the box against the pxcncpatcsx option. 
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make install clean 

echo ‘WITH PKGNG=yes’ >> /etc/make.conf 

rehash 

pkg2ng 

cp /usr/local/etc/pkg.conf.sample /usr/local/etc/pkg.conf 


S$ + =S= SF [= 


Edit the /usr/local/etc/pkg.cont file: 


PACKAGESITE : http://pkgbeta.freebsd.org/S{ABI}/latest 


Install ports-mgmt/portaudit: 


cd /usr/ports/ports-mgmt/portaudit 
make config-recursive 
make install clean 


rehash 


S$ += S= SF [= 


portaudit -Fda 


Step 6. Apply Security Patches and Install a 
Custom Kernel 

FreeBSD is a very secure operating system. However, 
even in this operating system people sometimes find se- 
curity holes. (The list of released FreeBSD Security Advi- 
sories can be found on the http:/www.freebsd.org/secu- 
rity/advisories.html webpage). 

In this step we will patch all security holes by updat- 
ing source and then recompiling world and kernel. We 
cannot use freebsd-update utility for these purposes 
because FreeBSD AMIs use a custom kernel. To re- 
compile the kernel from sources is the only option we 
have. 


Task 1. Synchronise Source 
Setup configuration for csup utility: 


# cp /usr/share/examples/cvsup/standard-supfile 


/etc/freebsd-supfile 


Open /etc/freebsd-supfile and change the URL of your 
CVSUP server in the following line: 


*default host=CHANGE THIS.FreeBSD.org 

Note 

You can find the list of CVSUP servers on the FreeBSD 
website. 


Save the file and then update the source: 


# csup /etc/freebsd-supfile 
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Task 2. Apply EC2-specific Patches 

Some EC2-specific patches have been applied to Free- 
BSD in order to improve performance and work around 
bugs. These are contained in /root/ec2-bits/*.patch:! 


# ls /root/ec2-bits/*.patch 
/croot/ec2-bits/blkfront.patch 
/root/ec2-bits/ec2.patch 
/root/ec2-bits/tcp mbuf chain limit.patch 
/root/ec2-bits/uart.patch 


Let's apply this patches: 


# cd /usr/src 

# patch < /root/ec2-bits/blkfront.patch 

# patch < /root/ec2-bits/ec2.patch 

# patch < /root/ec2-bits/tcp mbuf chain limit.patch 
# patch < /root/ec2-bits/uart.patch 


Task 3. Create a Custom Kernel Configuration 

For 32-bit systems, you can find the kernel configuration 
files in the /usr/src/sys/i386/conf directory. However, we 
have installed a 64-bit operation system, so let's use con- 
figuration files from the /usr/src/sys/amd64/conf directory: 


# cd /usr/src/sys/amd64/conf 


| always strip kernel from the drivers | do not use. You 
should copy the ceneric file into my _GENERIc and then 
comment those drivers you do not need in your kernel. 
lf you do not know what to comment, then leave every- 
thing as it is. 


# cp GENERIC MY GENERIC 
# vi MY GENERIC 


Copy xENHv» into MYKERNEL file, then edit myKERNEL file: 


# Cp XENHVM MYKERNEL 
# vi MYKERNEL 


Make the following changes in the file (remove the red 
lines and add the green lines without “+”): 


-include GENERIC 
+include MY GENERIC 
-ident XENHVM 

+ident MYKERNEL 
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+# PF 
+device pf 
+device pflog 


+device pfsync 


Task 4. Build world and Custom Kernel 

Type the code from Listing 2. When running mergemaster 
for the first time, please select “d” (delete “temporary file”) 
for the following files: 


/etc/group 


/etc/master.passwd 


Note 
A“temporary file” is a file without your configuration chang- 
es, the one which comes with FreeBSD source. 


lf you are prompted for other files and these files contain 
some configuration changes you've done and you want 
to keep these changes, then you should also choose “d”. 
If you want to replace your amended files with the default 
version from FreeBSD source, then select “i”. 

When you are running mergemaster for the second time 
(after make installworld), do not delete ec2 ~* files from 


/etc/rc.d/ when asked. Same logic as above applies to 
this step. Be ready to answer many questions. It will take 
a while but you will be rewarded with the new custom ker- 
nel with all security patches applied. 


Step 7. Setup OpenVPN 

OpenVPN is a robust and highly flexible tunneling appli- 
cation that uses all of the encryption, authentication, and 
certification features of the OpenSSL library to securely 
tunnel IP networks over a single TCP/UDP port. 

We will use OpenVPN on our FreeBSD NAT instance 
to make only one port open to public (UDP 1194). Once 
connected and authenticated, legitimate OpenVPN users 
will get access to all the servers behind the NAT instance, 
subject to the NAT instance’s firewall rules. 


Task 1. Install OpenVPN 
Install OpenVPN from ports: 


# cd /usr/ports/security/openvpn 
# make config-recursive 


# make install clean 


Leave the default options when asked. 





Listing 2. Rebuilding “world” and custom kernel 


CG" ist snc 

make cleanworld && make cleandir 
make buildworld 

make buildkernel KERNCONF=MYKERNEL 
make installkernel KERNCONF=MYKERNEL 
reboot 

adjkerntz -i 

ile CRS ENCES 10) i) 

GGy/Usr/ sre 

make installworld 

make delete-old 

MeLrgena sire tr 0) 

reboot 

Caw Men sire 


make delete-old-libs 
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# mkdir /usr/local/etc/openvpn 


chmod 0755 /usr/local/etc/openvpn/easy-rsa/2.0/* 





Listing 3. Script to copy the sample configuration and the scripts to generate the certificates and keys 


cp /usr/local/share/doc/openvpn/sample-config-files/server.conf /usr/local/etc/openvpn 


# 
# cp -a /usr/local/share/doc/openvpn/easy-rsa /usr/local/etc/openvpn 
# 
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Listing 4. Script for generating certificate for a user 


# cp /usr/local/etc/openvpn/keys/ca.crt /usr/local/etc/ 
openvpn/easy-rsa/2.0/keys/ 

# cp /usr/local/etc/openvpn/keys/ca.key /usr/local/etc/ 
openvpn/easy-rsa/2.0/keys/ 


# ./build-key user _name.your.domain 


Listing 5. OpenVPN server configuration file 


# MS Fix for Apps like Remote Desktop 
fragment 1400 


ms sfix 


# Which local IP address should OpenVPN 
# listen on? (optional) 

Flecal ta 016.00 

koe cil nO 0 =e ik@ 


ca /usr/local/etc/openvpn/keys/ca.crt 
cert /usr/local/etc/openvpn/keys/openvpn. your.domain.crt 
key /usr/local/etc/openvpn/keys/openvpn.your.domain.key 


# This file should be kept secret 
dh /usr/local/etc/openvpn/keys/dh2048.pem 
# Push routes to the client to allow it 
# the server. Remember that these 
# private subnets will also need 
# to know to route the OpenVPN client 
P address pool) (L065 0107 255. 255.7557 0) 
# back to the OpenVPN server. 


pushes  ronte. Is 0 al 0255 2550s 
Cush sOulke 000s 20 255. 250.2 ee 


Certain Windows-specific network settings 
can be pushed to clients, such as DNS 

or WINS server addresses. CAVEAT: 
http://openvpn.net/faq.html#dhcpcaveats 


The addresses below refer to the public 


He a a 


DNS servers provided by opendns.com. 
eush  “dhep-opelon, DNS 826.424 - 
Fouche Chep.ooelon wDNGeZ0G mG ie 20R2 200 


# It’s a good idea to reduce the OpenVPN 

# daemon’s privileges after initialization. 
# 

# You can uncomment this out on 


# non-Windows systems. 


user nobody 


group nobody 


# Output a short status file showing 
# current connections, truncated 
# and rewritten every minute. 


status /var/log/openvpn-status.log 


By default, log messages will go to the syslog (or 
on Windows, if running as a service, they will go to 


the “\Program Files\OpenVPN\log” directory). 


“log” will truncate the log file on OpenVPN startup, 


while “log-append” will append to it. Use one 


# 
# 
# 
# Use log or log-append to override this default. 
# 
# 
# or the other (but not both). 

pO openvpn.log 


log-append /var/log/openvpn.log 


Listing 6. Configuration file for your Windows OpenVPN client 


client 

dev tun 

proto udp 

remote openvpn.your.domain 1194 
resolv-retry infinite 


nobind 


fragment 1400 


persist-key 


persist-tun 


Gar “d= pak to VWdimectony ca cert” 
cert “d:\\path\\to\\directory\\user name.your.domain. 
Gia. 


key “d:\\path\\to\\directory\\user name.your.domain.key” 


Comp=l 70 


verb 3 





BSD 


MAGAZINE 


26 | 


11/2012 





FreeBSD as a NAT Instance in Amazon Cloud 


Task 2. 
Generate Certificates and Keys for OpenVPN Server 
Now let’s create a configuration directory, and then copy 
the sample configuration and the scripts we will need to 
generate the certificates and keys for our server and cli- 
ents: Listing 3. 

You can edit /usr/local/etc/openvpn/easy-rsa/2.0/ 
vars file if you want to change the default RSA key size 
etc. 


# /bin/sh 
# cd /usr/local/etc/openvpn/easy-rsa/2.0 && . 
# ./clean-all 


if Vacs 


Then create the CA: 
# | /build-ca 


When asked for the “Common Name’ please specify the 
Fully Qualified Domain Name (FQDN) of your NAT in- 
stance (for example, openvpn.your.domain). 

Generate the server key file: 


# ./build-key-server openvpn. your.domain 


When asked for the “Common Name” please again 
specify the Fully Qualified Domain Name (FQDN) of 
your NAT instance. Leave the Challenge Password emp- 
ty. 

Create the Diffie-Hellman parameters: 


# ./build-dh 


This can take a while... 
Copy the files you've generated to the /usr/local/etc/ 
openvpn/keys directory: 


# mkdir /usr/local/etc/openvpn/keys 
# cp /usr/local/etc/openvpn/easy-rsa/2.0/keys/* 
/usr/local/etc/openvpn/keys 


Then clean up everything from the /usr/local/etc/ 
openvpn/easy-rsa/2.0/keys directory: 


# ./clean-all 


Task 3. Generate Certificates and Keys for OpenVPN 
Users 

Generate certificate for a user: Listing 4. When asked for 
the Name and email address, please specify user’s name 
and email address. 
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Each user certificate you create, needs to have a 
unique Common Name (CN). | usually use user name. 
your.domain format for the Common Name to ensure 
the uniqueness. Leave the Challenge Password empty. 
Execute the above command for each user which will 
be connecting to your OpenVPN server. After that you 
should copy the user certificates to the /usr/local/etec 
/openvpn/keys directory: 


# cp /usr/local/etc/openvpn/easy-rsa/2.0/keys/* 
/usr/local/etc/openvpn/keys 
# ./clean-all 


The files you need to provide to each user are: 


° Ca.creet 
. user Nname.your. domain crt 


* user name.your.domain.key 


You can transfer the files by using WinSCP in the SCP 
mode. The key file (user _ name.your.domain.key) IS accessi- 
ble only by root. So you will need to change its permissions 
to make the transfer. But do not forget to revert it back once 
you are done. The file must be readable only by root. 


Task 4. Configure OpenVPN server 

Now let’s configure OpenVPN server. Open /usr/local/ 
etc/openvpn/server.conf and make the following changes: 
Listing 5. Enable the OpenVPN server during boot-time: 


# echo ‘openvpn enable="YES”"’ >> /etc/rce.conft 
# echo ‘openvpn configfile=”/usr/local/etc/openvpn/server. 
conr”’’ >> fetc/re.cont 


# echo ‘openvpn if="tun”’ >> /etc/rc.conf 
Start manually the OpenVPN server: 


# /usr/local/etc/rc.d/openvpn start 


Task 5. Configure OpenVPN Clients 
Please find the configuration file for your Windows Open- 
VPN client in the Listing 6. 


Note 
The OpenVPN client can be downloaded from http://open- 
von.se/download.html. 


Step 8. Launch a Microsoft Windows Instance in 
the Private Subnet 

Let's launch a Microsoft Windows instance in the private 
subnet to test that our NAT instance works properly: 
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¢ Select the “EC2” item from the “Services” menu in 
the AWS Management Console, and then click on the 
“Launch Instance’ button. 

¢ Choose the “Classic” wizard. 

¢ Select “Microsoft Windows Server 2008 R2 Base 64-bit”. 

¢ On the next wizard step choose the instance type, 
click on “VPC” and select your private subnet 
(10.0.2.0/24) from the list. Click “Continue”. 

¢ Select “1” in the “Number of Network Interfaces” 
drop-down list, and setup 10.0.2.20 as the IP address 
(subnet 10.0.2.0/24). 

e You can assign secondary IP addresses if needed. 
Then click “Continue” button twice. 

¢ Name your instance and then click “Continue”. 

¢ Create a new key pair or choose the existing one, 
and then click “Continue”. 

¢ Create a new security group and open port 3389 for 
RDP (TCP). See Figure 13. 

¢ Click “Launch”. 


Step 9. Test Remote Desktop Connection to the 
Microsoft Windows Instance 
Now let’s retrieve Windows password: 


¢ Select the “EC2” item from the “Services” menu in 
the AWS Management Console. 
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Figure 14. Updated Security Group for NAT Instance 
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¢ Click on the “Instances” on the “Navigation” pane. 

¢ Right-click on the Windows instance and then select 
the “Get Windows Password” from the popup menu. 

¢ Select the private key and then click on “Decrypt 
Password’. 


Now you can connect to 10.0.2.20 using Remote Desk- 
top Connection. 

Please note that right now only RDC to the Microsoft 
Windows instance will work. Nothing else. If you try to 
browse the Internet or click on Windows Update, the Mi- 
crosoft Windows instance will not be able to access the 
Internet. All its outgoing traffic goes through NAT instance 
but there are no such rules in the “NatSG” security group. 

For example, to allow HTTP/HTTPS traffic from your Mi- 
crosoft Windows instance, you should add the following 
rules to the NAT instance’s security group: Figure 14. 


Conclusion 

| hope that this article will motivate you to use FreeBSD 
instances in Amazon Cloud. The only downside of this is 
so called “Microsoft Windows tax”. You probably have no- 
ticed that all FreeBSD AMls for non-cluster instances (at 
least the ones that are not EOL yet) run as Microsoft Win- 
dows instances. As the result, you have to pay more for 
using them. (You can find out more details about this on 
the = http:/www.daemonology.net/blog/2012-01-16-Free 
BSD-now-on-all-EC2-instance-types.html webpage). 
Let's hope that in the nearest future Amazon will provide 
a mechanism for running in HVM mode without being la- 
beled as a ‘Windows’ instance. 

And once again, many thanks to Colin Percival (a Free- 
BSD developer, a member of the FreeBSD Core team, 
and the FreeBSD Security Officer) for his work to get 
FreeBSD running in Amazon Cloud. 


VAND777 

Andrey aka vand777 has been passionate about technology and 
computers since his early teens. He works for a financial insti- 
tution in the UK. He likes to play with FreeBSD in his spare time 
running and managing approximately 5 FreeBSD operating sys- 
tem instances running name servers, mail servers, web servers, 
application servers, database servers, firewalls, etc. for some 
personal domains on the internet. 


11/2012 


solutions 





Open Source Systems 
Design - Administration - Consulting 


- design and administration of server farms 

- load balancing and high availability solutions 
- ZFS file servers and storage appliances 

- FreeBSD, OpenIindiana and Linux 


Our open source projects: 


VX ConnectBot: SSH and telnet client for Android 
mfsBSD: memory-resident FreeBSD installations 
zfs-stats: ZFS statistics tools 


Contact information: 


VX Solutions s. r. o. 
Mag. Martin MatuSka 
E-Mail: office@vx.sk 
Web: http:/Awww.vx.sk 





HOW TO 





PostgreSQL 


Indexes (Part 2) 


This article continues the previous one, presenting the readers with 
a few index examples and how the access costs are computed by the 
query planner. All the examples shown here have been tested ona 
PostgreSQL 9.1 cluster running on a FreeBSD 8.2-RELEASE machine; 
all the example source code is available in a GitHub repository. 


What you will learn... 
¢ which indexes can be defined on existing data 
« how to analyze the effectiveness of an index 


be used to explain the basis of database indexes and 

how PostgreSQL performs the cost computation. It is 
assumed that the articles table has just been created and 
populated as described in the previous article in this se- 
ries and that no custom indexes have been created. 

First consider a single-column-filtered query: get all the 
articles of average difficulty (i.e., difficulty = ‘AVG’) and 
see how PostgreSQL handles it (Listing 1). 

The executor wants to perform a sequential scan (Seq 
Scan) that will cost 51667 and will extract 663933 tuples 
out of the original relation. The first step to understanding 
the generated plan is Knowing how the system computed 
the number of output tuples: the system stores statistical 
data in the catalogues and make it available through the 
special view pg_ stats (see Listing 2). Such statistical data 
is fundamental for the executor to choose the execution 


| n this section a few example queries and indexes will 


Listing 1. A query when no indexes have been defined 


QUERY PLAN 


Seq Scan on articles 


Filter: (diriculiy —~ AVG’ = :bochar) 





bsdmagdb=# EXPLAIN SELECT title, pages, listings, difficulty FROM articles WHERE difficulty = ‘AVG’; 


(cost=0.00..51667.00 rows=663933 width=38) 


What you should know... 

¢ basic shell commands 

¢ basic PostgreSQL concepts 

¢ server-side programming with PostgreSQL 


plan and the access method. In the case of the difficulty 
column the statistical data reports that there are only 3 
distinct values in the whole table and that such values are 
‘MIN’, ‘MAX’, and ‘AVG’, as expected (see the function in 
Listing 1). 

The most common vals and most common fregs are afr- 
rays whose values are related, and therefore the ‘MAX’ 
value appears with a frequency of 0.3359, the ‘MIN’ val- 
ue with a frequency of 0.332133 and the ‘AVG’ value 
with a frequency of 0.331967. This is the information re- 
quired to Know how many tuples the system is going to 
expect out of a query that filters on the difficulty = ‘AVG’ 
clause: 


output. tuples 
= 2000000 * 0.331967 
= 603934 


= Telation. Lotal tuples * Erequency 
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Now it is possible to compute the cost of the sequential 
scan: each data page of the target relation will be read 
with a weight of seq page cost and the system has 
to evaluate each tuple to see if it matches the selection 
condition, and therefore: 


seq scan.total cost = relation total pages * seq page cost 
+ relation. total tuples ~ (cpu tuple cost. + cpu 

Operator cost) 

26667 * 1 + 2000000 * 


51667 


(0..01+0.0025) 


That is exactly the cost reported by the query planner. 
In order to see how things change using indexes, build 
an index on the difficulty column as follows: 


bsdmagdb=# CREATE INDEX idx difficulty ON articles (difficulty) ; 


Similarly to regular data tables, since PostgreSQL 
manages indexes as relations, it is possible to see the 
amount of space occupied by an index: 


bsdmagdb=# SELECT relname, reltuples, relpages 

FROM pg class WHERE relname = ‘idx difficulty’ AND relkind = ‘i’; 
relname | reltuples | relpages 

ee et mem mee map eet wee me eet +—------—-—--—-—-4+4---------- 


10x difiiculty | Zet06 | 


PostgreSQL: Indexes (Part 2) 


The index has the same tuple count of the data relation, 
and this is because the index is full (that is covers each 
data tuple). However, since the index stores a lot less 
information with regard to the data table the number of 
pages Is less than that of the real data. 

As shown in Listing 3 the planner now decides to ex- 
ploit the newly created index and accesses the data ta- 
ble using such index. The first operation performed by 
the planner is a “Bitmap Index Scan’, that is a fancy 
way to say “walk the index but don't get out the tuples 
yet, just keep the pointers”; in fact the output width for 
such node is 0 (that is no tuples at all). In this first pass 
therefore the planner walks the index to see which tu- 
ples match the difficulty = ‘AVG’ clause and builds a bit- 
map of pointers to such tuples. For efficiency reasons, 
if the number of matching tuples increases, then the bit- 
map does not keep tuple pointers but page data point- 
ers, that is this pass is “lossy” and on the next step the 
planner Knows that the condition has to be checked 
again (hence the “Recheck Cond” on the second 
node). 

The second node reads the built bitmap and extracts 
the tuples out of the data table (the width now is not ze- 
ro): it is interesting to note that the rows number of the 
two nodes are exactly the same (as expected). It is pos- 
sible to compute the costs manually following similar 
steps as in the previous case: 





Listing 2. Listing 2. Statistical data used by the optimizer 


bsdmagdb=# \x 


PROM eg stats WHERE tablename — | arureles TAND atunance — 
tablename |) articles 

attname pecararic ules, 

Mees tine te | 3 


{MAX,MIN, AVG} 
Oto Oe slay Onesie) 


most common vals | 


most common freqs | 


Listing 3. The query plan using an index on the “difficulty” column 


QUERY PLAN 


Bitmap Heap Scan on articles 
Recheck Cond: (difficulty = ‘AVG’: :bpchar) 
[7 Sui Index Scan Ol lexea tiicuIEy 


Index Cond: (ditiiculty = “AVG’ s:bpchar) 





Dedmagdo=/; SHEECn Gablenane acchame,, Mdlseincl, moss sCOmmon vals, —most common uEeds 


PCaMaie icy 


bsdmagdb=# EXPLAIN SELECT title, pages, listings, difficulty FROM articles WHERE difficulty = ‘AVG’; 


(cost=10977.91..45944.07 rows=663933 width=38) 


(cost=0.00..10811.92 rows=663933 width=0) 
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output tuples = relation total tuples * frequency 
= 2000000 * Os331L967 
= 663934 


The bitmap index scan has to traverse the whole index, 
and therefore has to read each data page of the index 
(sequentially, that is with seq _ page cost) and to apply 
a cpu operator on each of the tuples in the index; there- 
fore the results are: 


bitmap index scan = ({ index total pages * random page cost 
+ index total tuples * ( cpu_index tuple cost + 
Cpu operator cost ) } 


* Index. Selectivity 


= ( 4392 * 4 + 2000000 * ( 0.005 + 0.0025 ) ) * 
O23 31967 

= { L/s66 + T5000 ) * 0.531967 

= 10811 


and then comes the “Bitmap Heap Scan” cost (See Box 
2), that is rounded to the cost of sequentially retrieve 
pages (skipping not-bitmapped tuples) and applying the 
filter operator (recheck): 


bitmap heap scan = ( relation data pages * seq page cost ) 


+ { relation tuples * index selectivity * { cpu_ 


tuple, Cost + Cpu operator Cost } 

( 26667 * 1) + { 2000000 * 0.331967 * { 0.01 
-O50025 } 

2600) + 6299 

= 34966 


Note that the above is the cost of the “Bitmap Heap 
Scan” node which, as shown in Listing 3, has a final cost 
of around 45944 and an initial cost of around 10977 and 
therefore a single run cost of 45944 — 10977 = 34967 
(some rounding in the results is acceptable). It is inter- 
esting to note that the initial cost of the Heap Scan node 
is not exactly the same of the final cost of the Index 
Scan node but is slightly higher; this is due to the sort- 
ing of the obtained bitmap and to the re-arrangement of 
the in memory structures so that the next node can start 
running. 

Why, having an index on difficulty, does the system not 
use it to immediately retrieve tuples out of the target re- 
lation but performs this “fancy” two step algorithm? The 
idea is the index is not selective enough to justify a pure 
index scan (with the cost of random page cost) and that is 
better to build a pre-ordered map of which data pages are 
required and to access them sequentially. 

Supposing that the difficulty = ‘AVG’ is the normal data- 
base ‘workload’ (i.e., a query that needs to be optimized 





Listing 4. Creation of a partial index to match the workload query 


bsdmagdb=# CREATE INDEX idx difficulty avg 

ON articles (difficulty) 

WHERE difficulty = ‘AVG’; 

bsdmagdb=# SELECT relname, reltuples, relpages 

PROM egeclass 

WERE welnieames—= tex satiety lavg/ Al evel ind —— a7 

relname | reltuples | relpages 

eu Nee eet el eet cee tea cae ay a en 

Teen bie Wiha wanv.cial| 666666 | 


Listing 5. Executing the workload query with the partial index 


QUERY PLAN 


Bitmap Heap Scan on articles 


= Sl tiap Index scan, on idx di ficuluy save 


Index Cond: (difficulty = ‘AVG’ ::bpchar) 





bedmagdi)—; UPDATE jog Vindex SHY indisvalig:— raise WHERE wndexrelid = 10x Giticuley @;regclass, 


bsdmagdb=# EXPLAIN SELECT title, pages, listings, difficulty FROM articles WHERE difficulty = ‘AVG’; 


(cost=11031.27..46046.61 rows=667867 width=38) 


(cost=0.00..10864.30 rows=667867 width=0) 
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since it is executed very often), then it is possible to im- 
prove performance by building a partial index that covers 
only this exact condition. Listing 4 shows the creation of 
the partial index and the size of the results: it is interesting 
to note that the index has now 1466 pages (against the 
4392 of the full one) and a lot less tuples. 

Listing 5 shows the execution plan to access the same 
data (i.e., difficulty = ‘AVG’) with the partial index in place. 
As readers can see the query plan has not changed, that 
is again a two step plan is selected, with a slightly differ- 
ent cost with respect to the previous case. The cost can 
be computed in a similar manner to the previous case, 
without having the index selectivity in the first node com- 
putation (since the index is 100% selective); what chang- 
es with respect to the previous case is the cost of the first 
node (Bitmap Index Scan): 


bitmap index scan = ( index total pages * random page cost 
+ index total tuples * ( cpu_index tuple cost + 


Cpu operator cost ) } 


PostgreSQL: Indexes (Part 2) 


( 1406 * 4 + 66606606 * ( 0.005 + 0.0025 ) ) 


= ( 5864 + 5000 ) 
= 10864 
bitmap heap scan = ( relation data pages * seq page cost ) 


+ ( relation tuples * index selectivity * { cpu_ 
tuple cost. + Cpu operator cost. ) 

( 26607 = 1 } + ( ZO00000 * 0.331967 * ( 0.01 
+ O,0025.)) ) 

26607 + 8299 


34966 


II 


Therefore, making a partial index on this particular query 
does not improve performance, and even if the resulting 
index is smaller, it still does not filter enough to be cho- 
sen as direct access method. 

The Bitmap Index Scan is often better than a pure Index 
Scan, since the latter could imply reading the index ran- 
domly more than one time, while the Bitmap Index Scan 
sorts the bitmap in order to allow for sequential data re- 
trieval. Therefore the Index Scan is used for very selective 





Listing 6. An Index Scan fora single row query retrieval 


QUERY PLAN 


index sean Using antrelessekey Onn annre kes 


Index Cond: (pk = 1) 


Listing 7. A two-tuples query that uses a Bitmap Index Scan 
QUERY PLAN 
Bitmap Heap Scan on articles 


((pk = 1) OR (pk = 1000000) ) 
(cost=8.87..8.87 rows=2 width=0) 


Recheck Cond: 
= lsukieite Oe 
=> Si tiaeeindex scan loneariveles ake 

Index Cond: (pk = 1) 
= PSL eiiciomindex SS cannon cde uke lest eke) 


Index Cond: (pk = 1000000) 


QUERY PLAN 


Seq Scan on articles 


Filter: 





bsdmagdb=# EXPLAIN SELECT title, abstract FROM articles WHERE pk = 1; 


(cost=0.00..8.44 rows=1 width=62) 


bsdmagdb=# EXPLAIN SELECT title, abstract FROM articles WHERE pk = 1 OR pk = 1000000; 


(cost=8.87..16.85 rows=2 width=62) 


(cost=0.00..4.43 rows=l1 width=0) 


(cost=0.00..4.43 rows=1 width=0) 


Listing 8. A query similar to the previous one but with a different set of clauses 


bsdmagdb=# EXPLAIN SELECT title, pages, listings, difficulty 
FROM articles WHERE difficulty <> ‘MIN’ AND difficulty <> ‘MAX’; 


(cost=0.00..56667.00 rows=883552 width=38) 
(de theulivys< MIN’ = 2bpochar) ANDI Vditiculiy <=] “MAX: bpebar)) 
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queries (just a few rows). As an example consider a query 
that retrieves exactly one tuple: since the system builds 
a BTree for the primary key, the query of Listing 6 shows 
an Index Scan. The cost can be roughly computed as a 
random page cost for the index page and the data page, 
and therefore almost 8. 

As an example, consider the query of Listing 7 that asks 
for two different rows via the table primary key: the optimiz- 
er chooses to perform a Bitmap Index Scan instead of an 
Index Scan. In such case the cost of the nodes can be com- 
puted in a very simple manner: having to access two dif- 
ferent index entries on different index pages cost random __ 
page_cost times two, and then comes the cost of retrieving 
the two data tuples (which could be in different pages), and 
therefore the overall cost is that of accessing four data pag- 
es at a random cost, and therefore 4 * 4 = 16. Then there 
is a little addition for the CPU costs and the need to do the 
OR between clauses, as shown in Listing 7. 

It is interesting to note that the query planner can be 
confused by having the same query run with the wrong 


set of clauses: consider the plan for the query of List- 
ing 8 that performs the same as the one of Listing 5 but 
without using the same indexes. The planner does not 
know that the {MIN, MAX} values are exclusive with re- 
gard to the {AVG} ones, and therefore performs a se- 
quential scan with the already explained cost. This ex- 
ample emphasizes how often the DBA work is not only to 
find out the best access method to the data, but also to 
rewrite a “bad” query into one that the planner can bet- 
ter understand. 

In order to explore more the PostgreSQL planner and 
index tools consider the creation of an index on the list- 
ings column and the query for all the tuples with the listing 
= 5 clause (with a frequency of 0.036633); this is a case 
very similar to the previous ones and as shown in Listing 
9 the resulting index and value frequencies lead to a Bit- 
map Heap Scan. 
bitmap index scan = ( index total pages * random page cost 


+ index total. tuples * 4. cpu index tuple. cost. + cpu. 





Listing 9. Creation of an index on the listings column 


bsdmagdb# CREATE INDEX idx listings 
ON articles ( listings); 
bsdmagdb=# SELECT relname, reltuples, relpages 
PROM pg sclass 
WHER Peeeel memes) bon Miko ine Ss: AnD relict a: 
relname | reltuples | relpages 
eee ie ee eee es ae a aS 
tx lastanes: || 


bsdmagdb=# \x 


101 


ipl elias NC Liaene 


most common freqs 


bsdmagdb=# \x 
QUERY PLAN 


Bitmap Heap Scan on articles 
Recheck Cond: (listings = 5) 
= Biltiee Index SsScanwon idx iis bings 


index Cond a listings = 5) 





PROMPeg estates Witkhe tablename — articles SANDeattnane — 

ae EN OE [De eae tar See Ses OSES ee eee eer Eee eee 
tablename [Parctrelles 

attname elas times 


bsdmagdb-=7 SELECT tabléname, altname, mn distinct, most common vals, most Common fregqs 
TITSEIng sis; 


OR 00 G7 0.0 Ss SUC, OO sr OG 7 pee sy les 3570 O.G 10 U OCG! Ul Ol OGm soy ea Oe SONG Oy Or) ss ontO 02 lal 


(COs a Ae CU Oem GerOne—T57 01 eWwaGEn—O7 ) 


(cost=0.00..1193.93 rows=73267 width=0) 
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operator cost ) ) * index selectivity 
( 2302 % 4 + ZO00000.* 4 0.005 + 0,0025...) 7 
0.036633 
( 17568 + 15000 ) 
= 1193 


II 


* 05036633 


bitmap heap scan = { relation data pages * seq page cost ) 
+ { relation tuples * index selectivity * { cpu. 
tuple cost + Cpu operator cost ) 
( 2606) > 1) + ( 2000000 % 0.030633 * { 0.01 
+ D,0025 ) } 
20667 + 916 
= 27583 


total. cost = 27583 + 1193 = 28776 


II 


How does the plan change if in a single query there 
are both clauses listings = 5 and difficulty = ‘AVG’? The 
planner decides to use a direct index access, as shown 
in Listing 10. Why does the planner not use a Bitmap 
Heap Scan on one of available indexes? If the plan- 
ner does, the index with the minimal cost would be the 
idx listings, which would have a total cost of near 
28776; then the cost for the filtering condition difficulty 
= ‘AVG’ has to be applied, and this results in a CPU op- 
eration applied on each output tuple of the index above, 
therefore increasing the cost of the whole access as 
follows: 


PostgreSQL: Indexes (Part 2) 


bitmap heap scan = 26776 + ( total data tuples * index 
selectivity * Cpu operator Gost } 


( 2000000: * 0.036622. * O.01 -] 


lI 


20/704 
29508 


lI 


which is greater than the cost of a pure Index Scan 
(28188). The latter can be approximated to the worst 
case that Is: 


index scan = random page cost * total index pages 
+ ( total index. tuples ~ cpu tuple andex cost] 
* ( Total date tuples * index selectayity * ( 
cpu _ tuple cost: } 
+ pW Operator cost * number of clauses ) 
Lises + 10000 + 1006 
28666 


II 


It is possible to create a compound index that filters on 
both the clauses, as shown in Listing 11. Supposing a 
Bitmap Heap Scan, the cost of using such index for the 
query of Listing 10 would result as follows (note that the 
selectivity is total and that the index can be read sequen- 
tially): 

bitmap index scan = ( ifidex total. pages. * seq page Cost 


+ index totel tuples * ( cpu index-tuple- cost + 





Listing 10. A query that has both the clauses 


FROM articles WHERE difficulty = ‘AVG’ and listings = 5; 
QUERY PLAN 


imdex Scan using dx diincubey avg on saneleles 
Index Cond: (dithculizy — "AVG! 22 bochax) 
Filter: (listings = 5) 


Listing 11. Creation of a compound index 


WHERE listings = 5 AND difficulty = ‘AVG’; 
bsdmagdb=# SELECT relname, reltuples, relpages 
BROMseguelass 


relname | reltuples | relpages 





bsdmagdb=# EXPLAIN SELECT title, pages, listings, difficulty 


(cost=0.00..28188.14 rows=24561 width=38) 


Dedmagdo=- CREAlH NDE ox Ono ly avg listings. . ON aruicles(diiiculey,  iseimgs) 


WHERE eeelniame —— tox Tdithewity avg iectings 5 SAND relkind)— 17 





www.bsdmag.org 


BSD :: 


MAGAZINE 





HOW TO 


2 * cpu_operator cost ) ) Therefore the total cost is nearly the same as the one 
= (185 * 1 + 66664 * ( 0.005 + 0.0025 ) ) obtained using another of previous indexes, and there- 
= ( 185 + 333 ) fore it is not worth creating this index for this data set. 
= Sie This is confirmed by Listing 12, that shows how the plan- 

bitmap heap scan = ( relation data pages * seq page cost ) ner decides to use the normal index instead of the com- 
+ ( index tuples * ( cpu tuple cost + 2 * cpu_ pound one and how forcing the usage of the compound 
operator cost ) index does not provide a great improvement in perfor- 
= ( 26667 * 1) + ( 66664 (0.01 + 2* 0.0025) ) mance. 
= 26667 + 1000 Plans with a single relation can be easy enough to read 
= 27667 and understand, but as more tables are involved in a set 
total cost = 27667 + 518 = 28185 of joins the plan can become very complex. In order to ex- 


Listing 12. The cost of the access with the idx_listings and idx_difficulty_avg_listings_5 indexes 


bsdmagdb=# EXPLAIN SELECT title, pages, listings, difficulty 
FROM articles WHERE difficulty = ‘AVG’ and listings = 5; QUERY PLAN 


Index scam Using tdx Githcullty avg on anrreles (cost—0 00 = 2elee. 14 nows—24 56) width—33) 
Index Cond: (difficulty = ‘AVG’: :bpchar) 
Filters i(listings:—- 5) 
bDedMmagda—7 seb nab leniide< seas 1Oemal se, 
bsdmagdb=# EXPLAIN SELECT title, pages, listings, difficulty 
FROM articles WHERE difficulty = ‘AVG’ and listings = 5; 
QUERY PLAN 


Bitmap Heap Scan on articles (cost=528.01..28100.18 rows=24561 width=38) 
Recheck Cond: ((difficulty = ‘AVG’::bpchar) AND (listings = 5)) 
= Sita index sean one Ga kiicwlt yy ave. Tistimec jee eest—0 00827130 | eovs—22 ol wii) 
Index Cond: ((difficulty = ‘AVG’::bpchar) AND (listings = 5)) 


Listing 13. A join query plan 
bsdmagdo=7 EXPLAIN SELECT al. titie, alllistings, al.diftiicuicy 
FROM articles al JOIN articles a2 ON al.pk = a2.pk 
WHERE al.difficulty = ‘AVG’ 
AND az listings — 5; 
QUERY PLAN 
Hash Join (cost=29961.92..72477.02 rows=24562 width=34) 
Hash Cond: (al.pk = a2.pk) 
=e nCex Cal UsticGm idx dt EnculiylavG nani teles val e(cost—0 00s. 26 5-le)) srows—o 0267  widtn—3 3) 
Index Cond: (datiiculty = “AVG’ = :bochar) 
=> Hash “(cest—28 795.062.2206 /95-08 rows—/(3267 wadtn—4) 
== Bitmap Heap Scan on areicies aZ (cost=-lZ17.24.728/95.08 rows=/3267 width—4) 
Recheck Cond: (listings = 5) 
= SUrmaps index scan, on dx ers bungs (cose —0. U0 aS. 9] hows— 1 3267 (wacdivla—() 
Index Cond: (listings = 5) 
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plain some join concepts consider a query that joins the 
articles table against itself in order to get all the articles 
with a difficulty = ‘AVG’ and a listings = 5 as shown in 
Listing 13. The plan starts from the Bitmap Heap Scan 
(or better from the Bitmap Index Scan subnode) on the 
idx_listings index for the relation a2 in order to get all the 
listings = 5 tuples; the cost is the same computed in the 
previous examples for this kind of plan. The Bitmap Heap 
Scan produces a set of tuples made only by the attribute 
of the join clause, that is the a2.pk attribute (as confirmed 
by the length of 4 bytes, an integer value). The keys are 
then placed into an hash map in a similar way to what 
happens during a Bitmap Heap Scan (first phase). At the 
same time, on the relation a7 an Index Scan is performed 
to extract all the difficulty = ‘AVG’ tuples; the cost is the 
same as computed in the previous examples. The out- 
put of this node is then compared with the whole hash 
map computed at the previous step in order to see if the 
tuple can be reported in output or not (see Figure 1). The 
cost of the whole plan depends on the work_mem size 
that should be large enough to keep in memory the whole 
hash map, otherwise multiple “batch” passes are required. 


Statistics Target 
The statistical data used by the query planner is computed 
depending on a configuration parameter called statistics 
target or statistics for short. Such parameter expresses 
how many data samples must be kept in the per-column 
statistics and therefore how the sample itself results accu- 
rate. Having a good statistical data is fundamental for the 
planner to take the right decision. 

In order to better explain how the statistics work con- 
sider the listings column of the article table. As shown 
In Listing 9 the most common vals and most common freqs 


PostgreSQL: Indexes (Part 2) 


for the listings column includes all the ten values with 
their distribution, and this is the result of having a statis- 
tics greater or equal to the number of possible values. By 
default the statistics on each column is set to a value of 
100 (since PostgreSQL 9), that means that no more than 
100 samples of data will be kept. It is possible to change 
the value of the samples to be kept using the aLTER TABLE 
ALTER COLUMN Command, as shown in Listing 14. For in- 
stance, setting the statics target to 5, as shown in Listing 
14, produces the effect that the most common values and 
most common freqs Of the listings column become a five- 
entries array therefore excluding some of the values that 
exist in the data set. 

The side effect of this wrong statistical target is that a 
query plan could lead to incorrect results: the same query 
of Listing 9 (with the clause difficulty = 5) is still execut- 
ed via the index idx_listings but the number of rows ex- 
pected as output is very different from the real value (as 
shown in Listing 15): the planner believes that no more 
than 10240 rows match the clause, while the execution 
shows that 73309 rows do. As readers can imagine, hav- 
ing wrong statistical data will lead to a wrong query plan 
estimation that can lead to the wrong access path selec- 
tion and therefore a non-optimal execution. 

Of course it does not make sense to increase the num- 
ber of statistical targets over the number of distinct values, 
and therefore for the listings column having more than 
101 values does not affect the query planning but could 
affect the time of the vacuum anaLyze and other anatyze 
commands. 


When and Where Placing an Index? 
The most difficult job of a DBA is to understand when and 
where an index (of any kind) can improve query execu- 
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Figure 1. A visual representation of the Hash Join (from PgAdmin III) 
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tion. It is worth reminding that each index requires main- 
tenance and can slow down vupDATE, INSERT @Nnd DELETE 
commands that need to modify such index. Therefore not 
used or not useful indexes should be removed from the 
database. 

The first step in deciding when and where an index 
could be required is to identify slow queries, or better, 
queries that can slow down the whole application. In 
fact, it could be that a query that is executed quite fast is 
repeated so many times on different data (not cached) 
that the whole application performance results bad. The 
database administrator has to “observe” the database 


while it is working to get the most frequent queries or 
those that are too slow. There are different tools, like 
pgfouine (see Box 3), that can help the DBA analyze the 
database logs and report which queries are most often 
executed. 

Out of the box PostgreSQL provides a configuration pa- 
rameter, log min duration statement, which can be used 
to log any query whose execution time requires more than 
the specified number of milliseconds. For instance setting 
the following: 


log min duration statement = 1000 





Listing 14. Decreasing the statistcs target 


bsdmagdb=# \x 


bsdmagdb=# VACUUM FULL ANALYZE articles; 


FROM articles WHERE listings = 5, 
QUERY PLAN 


Recheck Cond: (listings = 5) 


Index Cond: (listings = 5) 


Listing 16. An example of report on index usage 


PROMPeg estat ea lle indexes 


bsdmagdb=# ALTER TABLE articles ALTER COLUMN listings SET STATISTICS 5; 


bedmagdb=7 SELECT tablename, aliname, nm distinct, most common vals, most common freqs 


PROMPpg estates WikkEe tablename = articles SANDeatiname — = listings | 
tif OGIO NIB) Ne a Pane Sar Se = ea REESE a SOE eee Oe ee eee See 
tablename Peactreiles 

attname Peltstines 

MeOrs tine t Peeot 

MOS we OUMO my Vcr ii Ug, oy) 1 gAO) 


Mose COMMON PEGS ir 0 20720) Or Osea 067 7 On Ose 0E OS 16553, 05037 Toss 


Listing 15. A query plan that has the wrong number of output rows due to wrong statistical target 


bsdmagdb=# EXPLAIN ANALYZE SELECT title, listings, difficulty 


Bitmap Heap Scan on articles (cost=171.79..20035.29 rows=10240 width=34) 
(actual time=124.177..1872.909 rows=73309 loops=1) 


= Pita index scanven idx Wistings i cost—0 007. 69223) vows—)0240 widtin—0) 
(actual time=106.112..106.112 rows=73309 loops=1) 


Salbne P anidexgelmeame, scx sean, tex stUp read, idx tip neten 





WHERE relname = ‘articles’; 

~------------------------------ $----------}--------------4--------------- 
10x diiiiculey avg | 9 | 5A7 TSO a SA 7S ow 
idx listings | oy] 440064 | 5 
tdx vet ficwlby avg lastengs 59) Oy) Or 0 
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Box 1. Query hints 

PostgreSQL does not allow DBAs and users to tell the executor 
which path to choose in order to perform a particular query. This 
kind of executor suggestions are known as “query hints” and are 
usually available on commercial databases. The reason why Post- 
greSQL does not support them is that usually query hints are used 
in a very bad way: typically a DBA hints the executor to use an index 
because he thinks that such index will solve any performance prob- 
lem, but it turns out that the optimizer is better at doing the path se- 
lection job. Nevertheless, while it is not possible to hint the executor 
to use a specific index, it is possible to force it to not use an access 
method. This behavior is obtained using the tunables enable _ 
xxx that can be found in the postgresql.conf configuration file and 
that can be set also on the psql command line as follows: 


DeCmagGi— ot hwen =a homeec=s @ ama Ok ha, 
DeCmagGb— 7 wolienab Resedseate— Om 5, 


The available access method tunables are the following: 


enable bitmapscan 
enable hashagg 
enable hashjoin 
enable vimdexsean 
enable material 
enable merge on 
enable nesEloop 
emote Secsce 


emelo le syeicic 








enables tidsean 


Box 2. How is a Bitmap Heap Scan cost 


computed? 

Assuming the simplest case of a single index scan (i.e., no loops), the 

cost of a Bitmap Heap Scan is computed pretty much as follows: 

¢ compute the tuplesFetched on the data table as 
tuplesFetched=reltuples x indexSelectivity 

¢* compute the maximum number of pages of the data table T 
(one or greater); 

¢ compute the number of pagesFetched using the Mack- 
ert and Lohman formula, so that if the number of tuples be- 
comes greater the number of relation pages dominates, oth- 


erwise the number of tuples dominates: 
(2x T XtuplesFetched 
ce (2 T +tuplesFetched) 


¢ round pagesFetched so that it is not greater than T; 

- if pagesFetched is less than 2 (very small number of tuples, 
that is the number of pages are dominating) the data page 
access cost is randomPageCost, otherwise the data page ac- 
cess cost is computed with the following formula that has the 
lower limit at seqPageCost: 


costPerPage = randomPageCost —(randomPageCost — 


0.5 


pagesFetched ) 

a 

¢ so that the final I/O cost is done by the number of pages- 
Fetched multiplied by the costPerPage and the overall cost is 
done by the I/O cost and the CPU cost: 


segPageCost) X( 


totalCost =costPerPage x pagesFetched 
+(tuplesFetched X(cpuTupleCost +cpuOperatorCost: 


numOfWhereClauses )) 


For more info see the source code of the function cost bitmap _ 


the _ file 


heap scan in src/backend/optimizer/path/ 


COStLeiZeuc. 


Box 3. PgFouine 

PgFouine is a tool that helps administrators find problematic 
queries exploiting the database logs; in fact the project defines 
itself as a “log analyzer”. The software can be installed from the 
database/pgfouine ports or as a package as follows: 


# pkg add -r pgfouine 


It is important to instrument the PostgreSQL cluster to log data in 
a way that PgFouine can understand; at the time of writing PgFo- 
uine accepts logs from syslog, stderr and in CSV format. An easy, 
even if not recommended, setup of PostgreSQL for a quick use of 
PgFouine is as follows (file postgresql.conf): 


HOG) JS SSNstinene On = ecole” 
LegornGECoMlecr Ons son 

culene mode son 

leg) ence cOimy = yore) ee)” 
LOGmilenanes —s peseguesd malog: 
fog min Teluatwom Sseavemen: = 0 
Iker) slau ence = ee [ajo es [dla 

that will produce a file SPGDATA/pg_log/postgresql.log with 
all the executed statement. It is then possible to analyze the re- 
sult using PgFouine as follows: 


> /usr/local/bin/pgfouine -file $PGDATA/pg_ log/postgresql.log 
-format text -logtype stderr 


which will produce a text output similar to the following 
t#etee Overall statistics ##### 


Number of unique normalized queries: 2 


Number of queries: eZ 
Tobal query duratvon: 5:85 
t#tee Queries by type #####F 


SHLEC Is 2 100.0% 
t#tee Most frequent queries (N) ##### 

0 0s oh Cl COUNEG is hROMemaga zane; 
2 OOS) = SUE eL "COUNT (+) VEROM tarbielece 


Tt h Ht Tat tH 
260s) =a ELEC COUNT ( i EROMeareiewles, 
CC = 0 Sselher COUNT} 


Slowest queries (N) 
FROM magazine; 


The report from PgFouine shows that, since logging has been ac- 
tivated, the system has executed 12 SELECT statements and re- 
ports which ones were the most frequent and which the slowest. 
This information is very helpful for DBAs in order to catch proble- 
matic queries and their frequency. 
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On The Web 


PostgreSQL official Web Site: http://www.postgresgl.org 
ITPUG official Web Site: http://www. itpug.org 


PgFouine: http://pgfouine.projects.postgresql.org/ 





while produce a log entry like the following: 


LOG: duration: 1418.252 ms 
listings, difficulty FROM articles WHERE listings = 5 AND 
difficulty = ‘AVG’; 


statement: SELECT title, 


Depending on how fast such a query should run, the ad- 
ministrator can inspect further to see if an index can be 
used to improve the query execution speed. 

Having identified slow or problematic queries will let 
the administrator take decisions on which indexes to 
build. Here come into play all the other commands, like 
EXPLAIN and ANALYZE (to adjust Statistical data). It is pos- 
sible to disable one access method (see Box 1) in or- 
der to test if an index of a particular type can provide 
a real performance gain or to see why it is not used 
(i.e., the cost of using such index). Moreover, it is pos- 
sible to inspect the pg stat all index catalog to get in- 
formation about which indexes are used and how from 
the running system. Listing 15 shows an example of in- 
dex reporting through the pg _ stat _all_ indexes where it 
is possible to see that the partial index idx difficulty_ 
avg_ listings 5 has never been used since its creation 
and therefore could be removed without altering actual 
performances. 


Note 

Please note that values extracted from pg stat _all_ 
indexes Could be different depending on when the statisti- 
cal data is extracted and which queries have been execut- 
ed against the example data table). 


The idx scan column reports the total Index Scan, 
the idx tup fetch is the number of tuples retrieved 
through the index, and idx tup_ read Is the index tu- 
ples read by index usage. All the numbers are abso- 
lute, that is are counting since the index creation. For 
instance, as of Listing 16, the idx listings has been 
used mainly for Bitmap Index Scans, since the idx _ 
tup fecth IS almost zero and the idx tup_ readisa lot 
greater; in other words the index has been walked but 
no data tuple has been retrieved directly using index ref- 
erences. On the other hand idx difficulty avg has 
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PostrgeSQL Explain Documentation: http://www.postgresql.org/docs/current/static/sql-explain.html 
GitHub Repository containing the source code of the examples: https://github.com/fluca1978/fluca-pg-utils 





been used for Index Scans, since the number of index 
tuples (i.e., index walking) and the number of data tuples 
retrieved coincide. 


Summary and Coming Next 

This article completed the glance at indexing and cost 
computation. Main tools to understand which decisions 
the query planner took to execute a query has been 
shown. It is worth noting that finding the query optimiza- 
tion problem is very complex and hard to solve, and ex- 
perience is invaluable in this scenario. In the next articles 
other PostgreSQL features will be shown. 
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FreeBSD Enterprise 


Search with Apache Solr (Part 3) 


Continuing with our series on Apache Solr we will look at 
crawling the bsdmag.org website with nutch. 


What you will learn... 
¢ How to set up Apache Nutch to crawl websites and inject the con- 
tent into a MySQL database ans search with Solr 


ne of the important facets of enterprise search is 
OC) to be able to search internal (Intranet) and exter- 
nal websites. On a smaller scale, it is relatively 
trivial to assemble some code in PHP or Perl to pull web 
pages from a site, extract the links from the HTML and 
then “wash, rinse, repeat”. The difficulty arises when we 
want to index, rank, and effectively manage these results 
on a large scale. Almost 10 years ago, Apache Nutch was 
developed as the key technology to crawl 100 million web- 
pages, and has proved time and again that it is an efficient 
scalable solution. Nutch can be clustered, it is robots.txt 
friendly, and using modular plug-ins and schemas, can be 
tuned to bias certain results first. While Nutch integration 
and tuning Is quite specialized, it is fairly trivial to config- 
ure Nutch to dump results of a crawl session into MySQL 
(or any other JDBC based database for that matter), and 
rank / review these queries in Solr. 
For more information about Apache Nutch, please see 
the Nutch Wiki, which can be found at http://wiki.apache. 
org/nutch. 


Requirements 

We will continue to develop our Solr installation from the 
last article. Another core will be added, and test data im- 
ported from a MySQL database. While | will be using the 
vi editor, substitute the editor of your choice if preferred 
(e.g. emacs, nano etc.). 
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What you should know... 
¢ BSD administration skills, FreeBSD Apache Solr Part 1 and Part 2 arti- 
cle 


Step 1. Get the Files 


S su 


# pkg add -r apache-ant 
If you see: 


pkg add: warning: package “‘apache-ant-1.8.2 1” requires 

‘javavmwrapper-2.3.5’, but ‘javavmwrapper-2.4’ is installed 

pkg_add: can’t open dependency file ‘/var/db/pkg/jdk- 
1.6.0.3p4 27/+REQUIRED BY’! 


dependency registration is incomplete 
This dependency error can be ignored since apache ant 


runs under javavmwrapper-2.4. 
Install wget and bash if they are not already installed: 


# pkg add -r wget 
# pkg add -r bash 


Download Nutch: 


=He 


mkdir /tmp/nutch 


Se 


cd /tmp/nutch 


=e 


wget http://mirror.rmg.io/apache/nutch/2.1/apache- 


nveCh=Z sc 1l—sre. tar .o2 


=He 


tar -xvzf apache-nutch-2.1-sre.tar.gz 
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Step 2. Configure MySQL 


# /usr/local/etc/rce.d/mysql-server stop 
# cd /var/db/mysql 


Restart MySQL and check for UTF support (Figure 1). 


# mv my.cnf my.cnf-000 
# cp /usr/local/share/mysql/my-small.cnf ./my.cnf 


# vi my.cnf 
Add under [mysqld] section: 


character-set-server = utf8 
cOllation-server=uLis unicods ‘ci 

# /usr/local/etc/rc.d/mysql-server start 
# mysql -uroot 

mysql> \s 


Quit from MySQL 
mysql> \q 


Create a nutch.sql command file with the following con- 
tents: Listing 1. 
Create the database used for Nutch: 


# mysql < nutch.sql 


Step 3. Configure Solr 
Stop Tomcat and create a new collection 


# /usr/local/etc/rc.d/tomcat7 stop 
# cd /home/solr 


# cp -R collection2 collection3 


Edit solr.xm1 to reflect the new collection by adding the 
following lines: Listing 2. 
Flush the index data: 





Figure 1. MySQL with UTF8 support 
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# rm collection3/data/index/* 
# rm collection3/data/tlog /* 


# cd collection3/conf 


Modify the following files: 








Listing 1. SQL 


CREATE DATABASE nutch; 

USid, obec la7 

CREATE TABLE ‘webpage’ ( 

“1d! varchar( 7/67) "CHARACTER SEY Latin’ NOM NUL, 
‘headers’ blob, 

‘text’ mediumtext DEFAULT NULL, 

Tstatus sunt (Ely SOE Aine Usain, 

‘markers’ blob, 

"parseotatus” “blob, 

‘modifiedTime’ bigint(20) DEFAULT NULL, 

‘score’ float DEFAULT NULL, 

‘typ’ varchar(32) CHARACTER SET latinl DEFAULT NULL, 
‘baseUrl’ varchar(512) CHARACTER SET latinl DEFAULT NULL, 
‘content’ longtext DEFAULT NULL, 

‘title’ varchar( 2046), DEFAULT NULL, 

‘reprUrl’ varchar(512) CHARACTER SET latinl DEFAULT NULL, 
YEeuCchiIncerval’  inc(ll) DEFAULY NULL: 
‘prevFetchTime’ bigint(20) DEFAULT NULL, 

‘inlinks’ mediumblob, 

‘prevSignature’ blob, 

Youtlinks’ mediumblob, 

‘fetchTime’ bigint(20) DEFAULT NULL, 
‘retriesSinceFetch’ int(1l1) DEFAULT NULL, 
ToEOroCo lo eauds solo, 

‘signature’ bilob, 

‘metadata’ blob, 

PRIMARY KEY (aid ) 

) ENGINE=InnoDB DEFAULT CHARSET=utf8; 


GRANT SELECL, INSER LE, UPDALE ON muTchs> =O 
“nuteh @ tocelhose DENTE RED Bx 


“password” | 


Listing 2. XML 


<core schema="schema. xml” 
instanceDir="/home/solr/collection3/” 
name="collection3” 
config="solrconfig.xm1” 
dataDir="/home/solr/collection3/data” 

Fe 
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data-config. xml 
This should contain the following: Listing 3. 


schema.xml 

Remove everything between the <fields> tags and de- 
lete all <copyField> configuration so that the schema.xml 
looks like: Listing 4. 


solrconfig.xml 
Modify the data import handler thus: 


<str name="config”>/usr/home/solr/collection3/conf/data- 


contig, xmil</stir> 











Figure 2. Solr empty collection 3 
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Figure 3. Solr collection 3 schema 
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Figure 4. So/rconfig.xml for collection3 
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Ensure that the /etc/hosts file has a valid host name as- 
signed to local host otherwise nutch will fail: 


127 0 Oe I localhost solr 


Reboot: 


# reboot 


Check that the hosts file is correct: 
$ ping -t3 solr 


You should get a reply. 

Check that you have a collection with no documents in- 
dexed by accessing the solr management interface with 
your browser at http://vouripaddress:8080/solr (replace 
youripaddress with the IP address of your SOLR box). 

The schema and data import handler should look like 
this: Figure 3. 





Listing 3. XML 2 


<dataConfig> 


<dataSource driver="com.mysql.jdbc.Driver” 
url=" jdbcrmysql:// localhost, nurch” 
user="nutch” 
password="password” 


is 


<document mame— “webpage “> 


<entity name="webpage” 
query="select ID, TITLE, STATUS, BASEURL, CONTENT,M 
ETADATA from webpage”> 
<field column="ID” name="id” /> 

<field column="TITLE” name="title” /> 

<field column="STATUS” name="status” /> 

<field column="BASEURL” name="baseurl” /> 

<field column="CONTENT” name="content” /> 

<field column="METADATA” name="metadata” /> 
</entity> 


</document> 


</dataConfig> 
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Step 4. Build and Configure Nutch 


> Su 


# cd /tmp/nutch/apache-nutch-2.1 


Edit the ivy/ivy.xm1 file removing comments on lines 
105 and 107 enabling MySQL support so line 105 is as 
follows: 


<dependency org="mysql” name="mysgl-connector-java” 


reyv="5.1.18”" :cont="*—Sderault” /> 


Edit the conf/gora.properties file and comment out the 
default Sqlstore properties, adding MtSQL as the data 
store so the file reflects: Listing 5. 

Copy the nutch defaults across for our site crawl: 


cp conf/nutch-default.xml conf/nutch-site.xml 





Edit line 63 to reflect a sensible user agent as this can- 
not be empty: 


<value>FreeBSD Test</value> 
Build nutch: 
# ant runtime 


This will take a few minutes as the jar files etc. are down- 
loaded and nutch built. 


# cp -R runtime/local /home/solr/nutch 


# cd /home/solr/nutch 
Edit the first line of bin/nutch to match the following: 


#!/usr/local/bin/bash 





Listing 4. XML 3 


<fields> 


</ tlelas= 


<uniqueKey>id</uniqueKkey> 


<copyField source="id” dest="text”/> 
<copyField source="title” dest="text”/> 
<copyField source="status” dest="text”/> 
<copyField source="baseurl” dest="text”/> 
<copyField source="content” dest="text”/> 


<copyField source="metadata” dest="text”/> 


Listing 5. XML 4 


#gora.sqlstore.jdbc.driver=org.hsqldb.jdbc.JDBCDriver 


#gora.sqlstore.jdbc.user=sa 


#gora.sqlstore.jdbc.password= 


gora.sqlstore.jdbc.driver=com.mysql.jdbc.Driver 
Gora. sqlstore.jdoc.url—jdbe mysql: //local host +3306/nucen? 
gora.sqlstore.jdbc.user=nutch 


gora.sqlstore.jdbc.password=password 





CieiGdineme= ext Styoe= Lexe General swiidexed— Eiile @Stoted= thie, 


<field name="id” type="text general” indexed="true” stored="true” /> 
<field name="title” type="text general” indexed="true” stored="true” /> 
<field name="status” type="int” indexed="true” stored="true” /> 

<field name="baseurl” type="text general” indexed="true” stored="true”/> 
<field name="content” type="text general” indexed="true” stored="true”/> 


<field name="metadata” type="text general” indexed="true” stored="true”/> 


multivalued="true’/> 


#gora.sqistore.jdbc.url=jdbc:hsqldb:hsql://localhost/nutchtest 


\ createDatabasel fNotExist=true 








www.bsdmag.org 


BSD :: 


MAGAZINE 


HOW TO 


Add sava_ Home at the beginning of the file before 


cygwin=false 
JAVA_HOME=/usr/local 


Set up the nutch seed file (Replace http://bsdmag.org/ if 
desired): 


# mkdir -p urls 
# echo 'http://bsdmag.org/' > urls/seed.txt 
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Figure 5. Successful data import for collection3 
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Figure 6. Search query for Apache Solr 
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Figure 7. Content from http://bsdmag.org 
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Step 5. Crawl! 
# bin/nutch crawl urls -depth 3 -topN 5 > & logs/crawl.log 
Open another terminal: 


S$ cd /home/solr/nutch/logs 
S su 


# tail -f crawl.log 
You should see the site being crawled: 


fetching http://bsdmag.org/magazine/1815-network-security-auditing 
10/10 spinwaiting/active, 5 pages, 0 errors, 0.3 0.2 
pages/s, 57 54 kb/s, 5 URLS in 1 queues 

fetching http://bsdmag.org/news 

10/10 spinwaiting/active, 6 pages, 0 errors, 0.2 0.2 pages/s, 
61 78 kb/s, 4 URLs in 1 queues 


* queue: http://bsdmag.org 


maxThreads = 1 

inProgress = 0 

crawlDelay = 5000 
minCrawlDelay = 0 
nextFetchTime = 1348457886871 
now = 1348457884591 


0. http://bsdmag.org/newsletter 
1. http://bsdmag.org/take-part 


When complete (About 7 minutes as we are not crawling 
with any depth) run the following: 


# mysql -unutch -ppassword 
mysql> use nutch; 
mysql> select count(*) from webpage where status = 2; 


This should show the number of successfully fetched pages. 


Conclusion 

This is a very crude configuration, but should be enough to 
get you started. More attention needs to be paid to the sche- 
ma files (in particular to clear out redundant cruft and to op- 
timize). 
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Register Today! | 


November 3% &4" 
The event is being held at in Sunnyvale, CA 


Registration is available at 


MeetBSD California 2012 promises to be an experience unlike any other. 


MeetBSD California is not your average conference - it’s a meeting of the minds from all over 
the BSD community. MeetBSD California 2012 will feature community - driven break - out sessions, 
discussion groups, and 5-10 minute “lightning talks,” as well as longer talks from seasoned BSD experts. 


W @MeetBSDCA www.facebook.com/MeetBSDCalifornia 


